Bilevel Learning of the Group Lasso Structure Jordan Frecon 1 , - - PowerPoint PPT Presentation

bilevel learning of the group lasso structure
SMART_READER_LITE
LIVE PREVIEW

Bilevel Learning of the Group Lasso Structure Jordan Frecon 1 , - - PowerPoint PPT Presentation

Bilevel Learning of the Group Lasso Structure Jordan Frecon 1 , Saverio Salzo 1 , Massimiliano Pontil 1 , 2 1 CSML - Istituto Italiano di Tecnologia 2 Dept of Computer Science - University College London Thirty-second Conference on Neural


slide-1
SLIDE 1

Bilevel Learning of the Group Lasso Structure

Jordan Frecon1, Saverio Salzo1, Massimiliano Pontil1,2

1 CSML - Istituto Italiano di Tecnologia 2 Dept of Computer Science - University College London

Thirty-second Conference on Neural Information Processing Systems, Montreal, Canada

Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 1 / 9

slide-2
SLIDE 2

Linear Regression and Group Sparsity

Problem: Predict y ∈ RN from X ∈ RN×P Linear Regression: Find w ∈ RP such that In many applications, few groups are relevant to predict y ⇒ Group Sparse w Predict psychiatric disorder from activities in regions of the brain Predict protein functions from their molecular composition

Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 2 / 9

slide-3
SLIDE 3

Group Lasso

Given λ > 0 and a group-structure {G1, . . . , GL}, find ˆ w ∈ argmin

w∈RP

1 2y − Xw2 + λ

L

  • l=1

wGl2,

10 20 30 40 50

  • 5

5

G1 G2 G3 G4 G5 Group-sparse solution ˆ w Limitation: The group-structure {G1, . . . , GL} may be unknown

Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 3 / 9

slide-4
SLIDE 4

Setting

Setting: T Group Lasso problems with shared group-structure (∀t ∈ {1, . . . , T}) ˆ wt(θ) ∈ argmin

wt∈RP

1 2yt − Xtwt2 + λ

L

  • l=1

wt⊙θl2,

1 2 3 4 5 6 7 8 9 10 10 20 30 40 50

  • 5

5

1 2 3 4 5 10 20 30 40 50

encodes groups

Goal: Estimation of the optimal group-structure θ∗

Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 4 / 9

slide-5
SLIDE 5

A Bilevel Programming Approach

Upper-level Problem: minimize

[θ1···θL]∈Θ U(θ) := T

  • t=1

Et( ˆ wt(θ)) (e.g., validation error) where ˆ w(θ) =

  • ˆ

w1(θ) · · · ˆ wT(θ)

  • solves

Lower-level Problem: (T Group Lasso problems) minimize

w∈RP×T

L(w, θ) :=

T

  • t=1
  • 1

2yt − Xtwt2 + λ

L

  • l=1

θl ⊙ wt2

  • Difficulties:

ˆ w(θ) not available in closed form θ → ˆ w(θ) is nonsmooth [⇒ U is nonsmooth]

Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 5 / 9

slide-6
SLIDE 6

Approximate Bilevel Problem

Upper-level Problem: minimize

[θ1···θL]∈Θ UK(θ) := T

  • t=1

Et(w (K)

t

(θ)) where w (K)

t

(θ) → ˆ wt(θ) Dual Algorithm: u(0)(θ) chosen arbitrarily for k = 0, 1, . . . , K − 1

  • u(k+1)(θ) = A(u(k)(θ), θ)

dual update

  • w (K)

1

(θ) · · · w (K)

T

(θ)

  • = B(u(K)(θ), θ)

primal dual relationship Goals: Find A and B smooth [⇒ w(K) is smooth ⇒ UK is smooth] Prove that the approximate bilevel scheme converges .

Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 6 / 9

slide-7
SLIDE 7

Contributions

Bilevel Framework for Estimating the Group Lasso Structure Design of a Dual Forward-Backward Algorithm with Bregman Distances such that

1

A and B are smooth ⇒ UK is smooth

2

  • min UK → min U

argmin UK → argmin U

Implementation of proxSAGA algorithm: nonconvex stochastic variant of θ(q+1) = PΘ

  • θ(q) − γ∇UK(θ(q))
  • Jordan Frecon, Saverio Salzo, Massimiliano Pontil

NIPS 2018 7 / 9

slide-8
SLIDE 8

Numerical Experiment

Setting: T = 500 tasks, N = 25 noisy observations, P = 50 features. Estimate and group the features into, at most, L = 10 groups.

1 2 3 4 5 6 7 8 9 10 10 20 30 40 50

50 500 5000 6 8 10

Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 8 / 9

slide-9
SLIDE 9

Conclusion

Thank You

Our poster AB #92 will be presented in Room 210 & 230 at 5pm

Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 9 / 9