Imposing Hard Constraints on Deep Networks: Promises and - - PowerPoint PPT Presentation

imposing hard constraints on deep networks promises and
SMART_READER_LITE
LIVE PREVIEW

Imposing Hard Constraints on Deep Networks: Promises and - - PowerPoint PPT Presentation

Imposing Hard Constraints on Deep Networks: Promises and Limitations min w R ( w ) s.t C ( w ) = 0 P. Marquez-Neila, M. Salzmann, and P. Fua EPFL Switzerland Motivation: 3D Pose Estimation Given a CNN trained to predict the 3D locations of


slide-1
SLIDE 1
  • P. Marquez-Neila, M. Salzmann, and P. Fua

EPFL Switzerland

Imposing Hard Constraints

  • n Deep Networks:

Promises and Limitations

min

w R(w) s.t C(w) = 0

slide-2
SLIDE 2

Motivation: 3D Pose Estimation

2

Given a CNN trained to predict the 3D locations of the person’s joints:

  • Can we increase precision by using our knowledge that her left and right

limbs are of the same length?

  • If so, how should such constraints be imposed?
slide-3
SLIDE 3

In Shallower Times

Constraining a Gaussian Latent Variable Model to preserve limb lengths:

  • Better constraint satisfaction without sacrificing accuracy.
  • Can this be repeated with CNNs?

3 Varol et al. CVPR’12

Regression from PHOG features.

50 100 150 200 250 10 20 30 40 V = 75 N Reconstruction Error Yao11 Ours 50 100 150 200 250 0.1 0.2 0.3 V = 75 N Constraint Violation Yao11 Ours

slide-4
SLIDE 4

Standard Formulation

4

Given

  • Deep Network architecture φ(·, w),
  • a training set S = {(xi, yi), 1 ≤ i ≤ N} of N pairs of input and output

vectors, xi and yi, find w∗ = arg minwRS(w), with RS(w) = 1 N X

i

L(φ(xi, w), yi) .

slide-5
SLIDE 5

Adding Constraints

5

Hard Constraints:

Given a set of unlabeled points U = {x0

k}|U| k=1, find

min

w RS(w)

s.t Cjk(w) = 0 ∀j ≤ NC, ∀k ≤ |U| , where Cjk(w) = Cj(φ(x0

k; w)).

Soft Constraints:

Minimize min

w RS(w) +

X

j

λj X

k

Cjk(w)2 ! , where the λj parameters are positive scalars that control the relative contribution of each constraint.

In many “classical” optimization problems, hard constraints are preferred because they remove the need to adjust the values.

λ

slide-6
SLIDE 6

Lagrangian Optimization

6

Karush-Kuhn-Tucker (KKT) conditions:

min

w max Λ

L(w, Λ), with L(w, Λ) = R(w) + ΛT C(w),

Iterative minimization scheme:

At each iteration w ← w + dw with

—> When there are millions of unknown these linear systems are HUGE!

" JT J + ηI

∂C ∂w T ∂C ∂w

#  dw Λ

  • =

 −JT r(wt) −C(wt)

slide-7
SLIDE 7

Krylov Subspace Method

Solve when the dimension of v so large that B cannot be stored in memory.

7

Bv = b

  • Solve linear system by iteratively finding approximate solutions in the

subspace spanned by

  • Use Pearlmutter Trick to compute terms of the form

{b, Bb, B2b, ..., Bkb} for k = 0, 1, ..., N vT ∂f

∂w and vT ∂f ∂w

—-> It can be done!

slide-8
SLIDE 8

Results

8

68 69 70 71 72 73 74 75 76

Prediction error (mm)

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

Median constraint violation (mm)

Soft-SGD ( = 10, = 1e 07, hm = 1) Soft-SGD ( = 1, = 1e 06, hm = 1) Soft-SGD ( = 1, = 1e 07, hm = 1) Soft-Adam ( = 1, hm = 0) Soft-Adam ( = 10, hm = 0) Soft-Adam ( = 100, hm = 0) Soft-Adam ( = 1, hm = 1) Soft-Adam ( = 10, hm = 1) Soft-Adam ( = 100, hm = 1) Hard-SGD ( = 10000, subiters= 1, hm = 1) Hard-SGD ( = 100000, subiters= 5, hm = 0) Hard-SGD ( = 100000, subiters= 5, hm = 1) Hard-Adam (subiters= 1, hm = 1) Hard-Adam (subiters= 5, hm = 1)

… but soft constraints help even more! Hard constraints help … Hard Constraints No Constraints Soft Constraints

slide-9
SLIDE 9

Synthetic Example

9

Let x and ci for 1  i  200 be vectors of dimension d and let w∗ be either min

w

1 2kw x0k2 s.t kw cik 10 = 0, 1  i  200 (Hard Constraints)

  • r

min

w

1 2kw x0k2 + λ X

1≤i≤200

(kw cik 10)2 (Soft Constraints)

x0

d = 2 d = 1e6 With constraint batches d = 1e6 All constraints active

slide-10
SLIDE 10

10

Computational Complexity

" JT J + ηI

∂C ∂w T ∂C ∂w

#  dw Λ

  • =

 −JT r(wt) −C(wt)

  • At each iteration:
  • Np is the size of a minibatch, which can

be adjusted.

  • Nc is the number of constraints, which is

large if they are all active.

slide-11
SLIDE 11

Interpretation

Hard constraints can be imposed on the output of Deep Nets but they are no more effective than soft ones:

  • Not all constraints are independent from each other, which

results in ill-conditioned matrices.

  • We impose constraints on batches of constraints, which

means we do not keep a consistent set of them. —> We might still present this work at a positive result workshop.

11

slide-12
SLIDE 12

Low Hanging Fruits

12

Typical approach:

  • Identify an algorithm that can be naturally extended.
  • Perform the extension.
  • Show that your ROC curve is above the others.
  • Publish in CVPR or ICCV.
  • Iterate.
slide-13
SLIDE 13

Shooting for the Moon

13

We choose to go to the Moon in this decade and do the other things, not because they are easy, but because they are hard. J.F. Kennedy, 1962 In the context of Deep Learning:

  • What makes Deep Nets tick?
  • What are their limits?
  • Can they be replaced by something more streamlined?