Exact 0 -norm optimization via branch-and-bound methods S ebastien - - PowerPoint PPT Presentation

exact 0 norm optimization via branch and bound methods
SMART_READER_LITE
LIVE PREVIEW

Exact 0 -norm optimization via branch-and-bound methods S ebastien - - PowerPoint PPT Presentation

Exact 0 -norm optimization via branch-and-bound methods S ebastien Bourguignon Laboratoire des Sciences du Num erique de Nantes Ecole Centrale de Nantes GdR MIA, Thematic day on Non-Convex Sparse Optimization, Toulouse, October


slide-1
SLIDE 1

Exact ℓ0-norm optimization via branch-and-bound methods

S´ ebastien Bourguignon Laboratoire des Sciences du Num´ erique de Nantes ´ Ecole Centrale de Nantes GdR MIA, Thematic day on Non-Convex Sparse Optimization, Toulouse, October 9th 2020 Joint work with Ramzi Ben Mhenni (LS2N-ECN, now LITIS, Universit´ e de Rouen) Jordan Ninin (Lab-STICC / ENSTA Bretagne) Marcel Mongeau (Universit´ e de Toulouse, ENAC) Herv´ e Carfantan (Universit´ e de Toulouse, IRAP)

slide-2
SLIDE 2

Outline

1

Why?

2

Who?

3

How?

4

Where?

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 2 / 16

slide-3
SLIDE 3

Outline

1

Why? exact solutions to ℓ0-norm problems may achieve better estimates

2

Who? small to moderate size sparse problems can be solved exactly

3

How? dedicated Branch-and-Bound strategy

4

Where? directions for further works

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 2 / 16

slide-4
SLIDE 4

Exactness: exact criterion, exact optimization

True, unrelaxed, ℓ0-“norm” criterion1 x1 =

p |xp|

xq

q = p |xp|q

x0 := Card{xp| xp = 0} Some sparsity-enhancing functions

p ϕ(|xp|) and their unit balls.

Global optimization: optimality guaranteed by the algorithm

1On (re)lˆ

ache rien!

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 3 / 16

slide-5
SLIDE 5

Exactness may be worth. . .

Natural formulation for many problems P2/0 : min

x∈RP 1 2y − Ax2 2 s.t. x0 ≤ K

P0/2 : min

x∈RP x0 s.t. 1 2y − Ax2 2 ≤ ǫ

  • P2+0 : min

x∈RP 1 2y − Ax2 2 + λx0

  • Global optimum better solution [Bertsimas et al., 2016, Bourguignon et al., 2016]

50 100 −6 −4 −2 2 4 50 100 −6 −4 −2 2 4 50 100 −6 −4 −2 2 4 50 100 −6 −4 −2 2 4 50 100 −6 −4 −2 2 4

Data and truth OMP ℓ1 relaxation SBR Global optimum

y − H˚ x2

2 = 1.62 y − H

x2

2 = 6.07 y − H

x2

2 = 2.36 y − H

x2

2 = 2.22 y − H

x2

2 = 1.43

Results taken from [Bourguignon et al., 2016]

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 4 / 16

slide-6
SLIDE 6

. . . but exactness has a price

2On n’est jamais fort pour ce calcul.

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 5 / 16

slide-7
SLIDE 7

. . . but exactness has a price

2On n’est jamais fort pour ce calcul.

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 5 / 16

slide-8
SLIDE 8

. . . but exactness has a price

NP-hard2: x0 ≤ K P

K

  • possible combinations. . . in worst case scenario!

Branch-and-Bound: eliminate (hopefully huge) sets of possible combinations without resorting to their evaluation Moderate-size problems (P ∼ a few hundreds, K ∼ a few tens)

◮ one-dimensional problems ◮ deconvolution, time series spectral analysis, spectral unmixing, . . . ◮ variable/subset selection in Statistics

2On n’est jamais fort pour ce calcul.

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 5 / 16

slide-9
SLIDE 9

Mixed Integer Programming (MIP) reformulation

(see [Bienstock 1996, Bertsimas et al. 2016, Bourguignon et al. 2016]) Big-M assumption: ∀p, |xp| ≤ M. Then: min

x∈RP y − Ax2 2 s.t.

  • x0 ≤ K

∀p, |xp| ≤ M ⇔ min

b∈{0;1}P x∈RP

y − Ax2

2 s.t.

    

  • p

bp ≤ K ∀p, |xp| ≤ Mbp Can be addressed by MIP solvers (CPLEX, GUROBI, . . . ) but computation time ↑ / limited to small size Here: No need for MIP reformulation nor binary variables Specific Branch-and-Bound construction for problems P2/0, P0/2, and P2+0

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 6 / 16

slide-10
SLIDE 10

Branch-and-Bound resolution

[Land & Doig, 1960]

Decision tree for binary variables At each node, a lower bound on all subproblems contained by this node

remaining binary variables are relaxed into

  • 0, 1
  • If this bound exceeds the best known solution, the branch is pruned.

P(0) P(1) P(4)

bp0 = 1 bp0 = 0

P(2) P(3) P(5) P(6)

bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0

⊲ Which variable bp branch on? ⊲ Which side explore first? ⊲ Which node explore first?

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 7 / 16

slide-11
SLIDE 11

Branch-and-Bound resolution

[Land & Doig, 1960]

Decision tree for binary variables At each node, a lower bound on all subproblems contained by this node

remaining binary variables are relaxed into

  • 0, 1
  • If this bound exceeds the best known solution, the branch is pruned.

P(0) P(1) P(4)

bp0 = 1 bp0 = 0

P(2) P(3) P(5) P(6)

bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0

⊲ Which variable bp branch on? highest relaxed variable ⊲ Which side explore first? bp = 1 ⊲ Which node explore first? depth-first search

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 7 / 16

slide-12
SLIDE 12

Branch-and-Bound resolution

[Land & Doig, 1960]

Decision tree for binary variables At each node, a lower bound on all subproblems contained by this node

remaining binary variables are relaxed into

  • 0, 1
  • If this bound exceeds the best known solution, the branch is pruned.

P(0) P(1) P(4)

bp0 = 1 bp0 = 0

P(2) P(3) P(5) P(6)

bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0

⊲ Which variable bp branch on? highest relaxed variable ⊲ Which side explore first? bp = 1 ⊲ Which node explore first? depth-first search ⊲ Computation of relaxed solutions?

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 7 / 16

slide-13
SLIDE 13

Branch-and-Bound resolution

[Land & Doig, 1960]

Decision tree for binary variables At each node, a lower bound on all subproblems contained by this node

remaining binary variables are relaxed into

  • 0, 1
  • If this bound exceeds the best known solution, the branch is pruned.

P(0) P(1) P(4)

bp0 = 1 bp0 = 0

P(2) P(3) P(5) P(6)

bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0

⊲ Which variable bp branch on? highest relaxed variable ⊲ Which side explore first? bp = 1 ⊲ Which node explore first? depth-first search ⊲ Computation of relaxed solutions? related to ℓ1-norm optimization. . .

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 7 / 16

slide-14
SLIDE 14

MIP continuous relaxation and ℓ1 norm

P2/0 : min

x∈RP b∈{0,1}P 1 2y − Ax2

s.c.

  • p bp ≤ K

∀p, −Mbp ≤ xp ≤ Mbp R2/0 : min

x∈RP b∈[0,1]P 1 2y − Ax2

s.c.

  • p bp ≤ K

∀p, −Mbp ≤ xp ≤ Mbp

1

  • M

M

We have3 min R2/0 = min

x∈RP 1 2y − Ax2

s.t.

  • p |xp| ≤ MK

∀p, |xp| ≤ M .

3Proof: for a solution (x⋆, b⋆) of PR 2/0, we have |x⋆| = Mb⋆. . .

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 8 / 16

slide-15
SLIDE 15

Continuous relaxation within the branch-and-bound procedure

At a given node:

◮ bS0 = 0 and xS0 = 0 ◮ bS1 = 1 and |xS1| ≤ M ◮ bS free and |xS| ≤ MbS

  • p bp = Card S1 +

p∈S bp P(0) P(1) P(4)

bp0 = 1 bp0 = 0

P(2) P(3) P(5) P(6)

bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0

The relaxed problem at node i reads equivalently: R(i)

2/0 : min xS, xS1 1 2y − ASxS − AS1xS12 2

s.t.      xS1 ≤ M(K − Card S1) xS∞ ≤ M xS1∞ ≤ M ⊲ Least squares, ℓ1 norm (partially) and box constraints. ⊲ No binary variables!

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 9 / 16

slide-16
SLIDE 16

Optimization with (partial) ℓ1-norm and box constraints

Homotopy continuation principle

Standard case [Osborne et al. 2000] With free variable and box constraints min

x 1 2y − Ax2 2 + λx1

min

xS,xS1 1 2y − ASxS − AS1xS12 2 + λxS1

s.c.

  • xS∞ ≤ M

xS1∞ ≤ M λ

x∗

1

x∗

2

x∗

3

x∗

4

λ∗

x∗

λ(0) λ(1) λ(2) λ(3) λ(4)

M −M λ x∗

x∗

1

x∗

2

x∗

3

x∗

4

x∗

5

λ∗

λ(0) λ(1) λ(2) λ(3) λ(4) λ(5) λ(6)

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 10 / 16

slide-17
SLIDE 17

Homotopy continuation

Similarly solves relaxations for the sparsity-constrained problem: R(i)

2/0 : min xS, xS1 1 2y − ASxS − AS1xS12 2

s.t.

  • xS1 ≤ τ ⋆

xS∞ ≤ M, xS1∞ ≤ M and for the error-constrained problem: R(i)

0/2 : min xS, xS1

xS1 s.t.

  • 1

2y − ASxS − AS1xS12 2 ≤ ǫ⋆

xS∞ ≤ M, xS1∞ ≤ M

1 2 y − AS1x∗ S1 − ASx∗ S 2

x∗

S 1

τ ⋆ ǫ⋆

λ(0) λ(1) λ(2) λ(3) λ(4) λ∗ Pareto curve

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 11 / 16

slide-18
SLIDE 18

Sparse deconvolution: computing time

Implementation in C++, UNIX desktop machine (single core)

N = 120, P = 100, SNR = 10 dB, 50 instances

Problem Branch-and-Bound MIP CPLEX12.8 Time

  • Nb. nodes

Failed Time

  • Nb. nodes

Failed (s) (×103) (s) (×103) K = 5 0.7 1.28 3.0 1.71 P2/0 K = 7 11.6 17.89 16.6 21.51 K = 9 43.5 57.37 9 53.8 72.04 6 K = 5 0.1 0.21 25.7 6.71 P0/2 K = 7 0.9 2.32 114.8 49.54 2 K = 9 2.5 5.22 328.2 101.07 17 K = 5 1.8 2.01 3.2 1.98 P2+0 K = 7 7.3 10.20 7.4 9.61 K = 9 25.6 31.80 5 17.3 23.74 2

Remark: C5

100 ∼ 7.5 107, C7 100 ∼ 1.6 1010, C9 100 ∼ 1.9 1012

⊲ Competitive exploration strategy (despite simple) ⊲ Efficient node evaluation (especially for small K) ⊲ Performance ≥ CPLEX, ≫ CPLEX for P0/2

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 12 / 16

slide-19
SLIDE 19

Subset selection problems (framework in [Bertsimas et al., 2016])

N = 500, P = 1 000, SNR = 8 dB, 50 instances

ap ∼ N(0, Σ), σij = ρ|i−j| y = Ax0 + ǫ, with equispaced non-zero values in x0 equal to 1 ρ = 0.8

ρ = 0.95

Exact recovery (%)

10 20 30 40 K 20 40 60 80 100 5 10 15 20

K 20 40 60 80 100

Sparsity degree Sparsity degree

⊲ Better estimation than standard methods (even with limited time) ⊲ Performance ≫ CPLEX

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 13 / 16

slide-20
SLIDE 20

Subset selection: computing times

N = 500, P = 1 000, SNR = 8 dB, 50 instances

Problem Branch-and-Bound CPLEX Time Nodes T/N Failed Time Nodes T/N Failed (s) (×103) (s) (s) (×103) (s) K = 5 0.3 0.01 0.02 80.0 0.01 6.07 P2/0 K = 9 7.4 0.15 0.05 265.8 0.18 1.45 K = 13 54.6 0.69 0.08 6 786.0 0.84 0.93 46 K = 17 624.4 5.82 0.11 43

  • 50

K = 5 0.2 0.01 0.02 1 495.7 0.32 1.54 11 ρ = 0.8 P0/2 K = 9 16 0.21 0.08

  • 50

K = 13 311.5 2.32 0.13 12

  • 50

K = 17 694.3 4.90 0.14 47

  • 50

K = 5 0.3 0.03 0.01 235.3 0.04 6.00 P2+0 K = 9 14.3 0.95 0.02 549.1 1.29 0.42 4 K = 13 855.3 34.52 0.02 32 982.6 3.61 0.27 49 K = 17 996.9 42.49 0.02 41

  • 50

⊲ CPLEX performance ↓↓ ⊲ Resolution guaranteed up to K ∼ 15

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 14 / 16

slide-21
SLIDE 21

Conclusions

Exact ℓ0-norm solutions may be worth being computed Specific Branch-and-bound method ≫ generic MIP resolution

◮ Cardinality-constrained, error-constrained, penalized problems ◮ Efficiency depends on K, P, but also on noise level / correlation level in A ◮ Partial exploration competitive solutions (although not guaranteed)

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 15 / 16

slide-22
SLIDE 22

Conclusions

Exact ℓ0-norm solutions may be worth being computed Specific Branch-and-bound method ≫ generic MIP resolution

◮ Cardinality-constrained, error-constrained, penalized problems ◮ Efficiency depends on K, P, but also on noise level / correlation level in A ◮ Partial exploration competitive solutions (although not guaranteed)

⊲ More sophisticated branching rules, local heuristics, cutting planes, . . .

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 15 / 16

slide-23
SLIDE 23

Conclusions

Exact ℓ0-norm solutions may be worth being computed Specific Branch-and-bound method ≫ generic MIP resolution

◮ Cardinality-constrained, error-constrained, penalized problems ◮ Efficiency depends on K, P, but also on noise level / correlation level in A ◮ Partial exploration competitive solutions (although not guaranteed)

⊲ More sophisticated branching rules, local heuristics, cutting planes, . . . ⊲ Improve relaxations non-convex solutions!

|x| |x|p

CEL0(x)

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 15 / 16

slide-24
SLIDE 24

Thank you. ANR JCJC program, 2016-2021, MIMOSA: Mixed Integer programming Methods for Optimization of Sparse Approximation criteria.

⊲ S. Bourguignon, J. Ninin, H. Carfantan and M. Mongeau. IEEE Transactions on Signal Processing, 2016. ⊲

  • R. Ben Mhenni, S. Bourguignon and J. Ninin.

Optimization Methods and Software, submitted. ⊲ Matlab and C codes available upon request.

  • S. Bourguignon

ℓ0-norm optimization & branch-and-bound 16 / 16