Exact 0 -norm optimization via branch-and-bound methods S ebastien - - PowerPoint PPT Presentation
Exact 0 -norm optimization via branch-and-bound methods S ebastien - - PowerPoint PPT Presentation
Exact 0 -norm optimization via branch-and-bound methods S ebastien Bourguignon Laboratoire des Sciences du Num erique de Nantes Ecole Centrale de Nantes GdR MIA, Thematic day on Non-Convex Sparse Optimization, Toulouse, October
Outline
1
Why?
2
Who?
3
How?
4
Where?
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 2 / 16
Outline
1
Why? exact solutions to ℓ0-norm problems may achieve better estimates
2
Who? small to moderate size sparse problems can be solved exactly
3
How? dedicated Branch-and-Bound strategy
4
Where? directions for further works
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 2 / 16
Exactness: exact criterion, exact optimization
True, unrelaxed, ℓ0-“norm” criterion1 x1 =
p |xp|
xq
q = p |xp|q
x0 := Card{xp| xp = 0} Some sparsity-enhancing functions
p ϕ(|xp|) and their unit balls.
Global optimization: optimality guaranteed by the algorithm
1On (re)lˆ
ache rien!
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 3 / 16
Exactness may be worth. . .
Natural formulation for many problems P2/0 : min
x∈RP 1 2y − Ax2 2 s.t. x0 ≤ K
P0/2 : min
x∈RP x0 s.t. 1 2y − Ax2 2 ≤ ǫ
- P2+0 : min
x∈RP 1 2y − Ax2 2 + λx0
- Global optimum better solution [Bertsimas et al., 2016, Bourguignon et al., 2016]
50 100 −6 −4 −2 2 4 50 100 −6 −4 −2 2 4 50 100 −6 −4 −2 2 4 50 100 −6 −4 −2 2 4 50 100 −6 −4 −2 2 4
Data and truth OMP ℓ1 relaxation SBR Global optimum
y − H˚ x2
2 = 1.62 y − H
x2
2 = 6.07 y − H
x2
2 = 2.36 y − H
x2
2 = 2.22 y − H
x2
2 = 1.43
Results taken from [Bourguignon et al., 2016]
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 4 / 16
. . . but exactness has a price
2On n’est jamais fort pour ce calcul.
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 5 / 16
. . . but exactness has a price
2On n’est jamais fort pour ce calcul.
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 5 / 16
. . . but exactness has a price
NP-hard2: x0 ≤ K P
K
- possible combinations. . . in worst case scenario!
Branch-and-Bound: eliminate (hopefully huge) sets of possible combinations without resorting to their evaluation Moderate-size problems (P ∼ a few hundreds, K ∼ a few tens)
◮ one-dimensional problems ◮ deconvolution, time series spectral analysis, spectral unmixing, . . . ◮ variable/subset selection in Statistics
2On n’est jamais fort pour ce calcul.
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 5 / 16
Mixed Integer Programming (MIP) reformulation
(see [Bienstock 1996, Bertsimas et al. 2016, Bourguignon et al. 2016]) Big-M assumption: ∀p, |xp| ≤ M. Then: min
x∈RP y − Ax2 2 s.t.
- x0 ≤ K
∀p, |xp| ≤ M ⇔ min
b∈{0;1}P x∈RP
y − Ax2
2 s.t.
- p
bp ≤ K ∀p, |xp| ≤ Mbp Can be addressed by MIP solvers (CPLEX, GUROBI, . . . ) but computation time ↑ / limited to small size Here: No need for MIP reformulation nor binary variables Specific Branch-and-Bound construction for problems P2/0, P0/2, and P2+0
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 6 / 16
Branch-and-Bound resolution
[Land & Doig, 1960]
Decision tree for binary variables At each node, a lower bound on all subproblems contained by this node
remaining binary variables are relaxed into
- 0, 1
- If this bound exceeds the best known solution, the branch is pruned.
P(0) P(1) P(4)
bp0 = 1 bp0 = 0
P(2) P(3) P(5) P(6)
bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0
⊲ Which variable bp branch on? ⊲ Which side explore first? ⊲ Which node explore first?
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 7 / 16
Branch-and-Bound resolution
[Land & Doig, 1960]
Decision tree for binary variables At each node, a lower bound on all subproblems contained by this node
remaining binary variables are relaxed into
- 0, 1
- If this bound exceeds the best known solution, the branch is pruned.
P(0) P(1) P(4)
bp0 = 1 bp0 = 0
P(2) P(3) P(5) P(6)
bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0
⊲ Which variable bp branch on? highest relaxed variable ⊲ Which side explore first? bp = 1 ⊲ Which node explore first? depth-first search
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 7 / 16
Branch-and-Bound resolution
[Land & Doig, 1960]
Decision tree for binary variables At each node, a lower bound on all subproblems contained by this node
remaining binary variables are relaxed into
- 0, 1
- If this bound exceeds the best known solution, the branch is pruned.
P(0) P(1) P(4)
bp0 = 1 bp0 = 0
P(2) P(3) P(5) P(6)
bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0
⊲ Which variable bp branch on? highest relaxed variable ⊲ Which side explore first? bp = 1 ⊲ Which node explore first? depth-first search ⊲ Computation of relaxed solutions?
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 7 / 16
Branch-and-Bound resolution
[Land & Doig, 1960]
Decision tree for binary variables At each node, a lower bound on all subproblems contained by this node
remaining binary variables are relaxed into
- 0, 1
- If this bound exceeds the best known solution, the branch is pruned.
P(0) P(1) P(4)
bp0 = 1 bp0 = 0
P(2) P(3) P(5) P(6)
bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0
⊲ Which variable bp branch on? highest relaxed variable ⊲ Which side explore first? bp = 1 ⊲ Which node explore first? depth-first search ⊲ Computation of relaxed solutions? related to ℓ1-norm optimization. . .
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 7 / 16
MIP continuous relaxation and ℓ1 norm
P2/0 : min
x∈RP b∈{0,1}P 1 2y − Ax2
s.c.
- p bp ≤ K
∀p, −Mbp ≤ xp ≤ Mbp R2/0 : min
x∈RP b∈[0,1]P 1 2y − Ax2
s.c.
- p bp ≤ K
∀p, −Mbp ≤ xp ≤ Mbp
1
- M
M
We have3 min R2/0 = min
x∈RP 1 2y − Ax2
s.t.
- p |xp| ≤ MK
∀p, |xp| ≤ M .
3Proof: for a solution (x⋆, b⋆) of PR 2/0, we have |x⋆| = Mb⋆. . .
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 8 / 16
Continuous relaxation within the branch-and-bound procedure
At a given node:
◮ bS0 = 0 and xS0 = 0 ◮ bS1 = 1 and |xS1| ≤ M ◮ bS free and |xS| ≤ MbS
- p bp = Card S1 +
p∈S bp P(0) P(1) P(4)
bp0 = 1 bp0 = 0
P(2) P(3) P(5) P(6)
bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0
The relaxed problem at node i reads equivalently: R(i)
2/0 : min xS, xS1 1 2y − ASxS − AS1xS12 2
s.t. xS1 ≤ M(K − Card S1) xS∞ ≤ M xS1∞ ≤ M ⊲ Least squares, ℓ1 norm (partially) and box constraints. ⊲ No binary variables!
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 9 / 16
Optimization with (partial) ℓ1-norm and box constraints
Homotopy continuation principle
Standard case [Osborne et al. 2000] With free variable and box constraints min
x 1 2y − Ax2 2 + λx1
min
xS,xS1 1 2y − ASxS − AS1xS12 2 + λxS1
s.c.
- xS∞ ≤ M
xS1∞ ≤ M λ
x∗
1
x∗
2
x∗
3
x∗
4
λ∗
x∗
λ(0) λ(1) λ(2) λ(3) λ(4)
M −M λ x∗
x∗
1
x∗
2
x∗
3
x∗
4
x∗
5
λ∗
λ(0) λ(1) λ(2) λ(3) λ(4) λ(5) λ(6)
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 10 / 16
Homotopy continuation
Similarly solves relaxations for the sparsity-constrained problem: R(i)
2/0 : min xS, xS1 1 2y − ASxS − AS1xS12 2
s.t.
- xS1 ≤ τ ⋆
xS∞ ≤ M, xS1∞ ≤ M and for the error-constrained problem: R(i)
0/2 : min xS, xS1
xS1 s.t.
- 1
2y − ASxS − AS1xS12 2 ≤ ǫ⋆
xS∞ ≤ M, xS1∞ ≤ M
1 2 y − AS1x∗ S1 − ASx∗ S 2
x∗
S 1
τ ⋆ ǫ⋆
λ(0) λ(1) λ(2) λ(3) λ(4) λ∗ Pareto curve
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 11 / 16
Sparse deconvolution: computing time
Implementation in C++, UNIX desktop machine (single core)
N = 120, P = 100, SNR = 10 dB, 50 instances
Problem Branch-and-Bound MIP CPLEX12.8 Time
- Nb. nodes
Failed Time
- Nb. nodes
Failed (s) (×103) (s) (×103) K = 5 0.7 1.28 3.0 1.71 P2/0 K = 7 11.6 17.89 16.6 21.51 K = 9 43.5 57.37 9 53.8 72.04 6 K = 5 0.1 0.21 25.7 6.71 P0/2 K = 7 0.9 2.32 114.8 49.54 2 K = 9 2.5 5.22 328.2 101.07 17 K = 5 1.8 2.01 3.2 1.98 P2+0 K = 7 7.3 10.20 7.4 9.61 K = 9 25.6 31.80 5 17.3 23.74 2
Remark: C5
100 ∼ 7.5 107, C7 100 ∼ 1.6 1010, C9 100 ∼ 1.9 1012
⊲ Competitive exploration strategy (despite simple) ⊲ Efficient node evaluation (especially for small K) ⊲ Performance ≥ CPLEX, ≫ CPLEX for P0/2
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 12 / 16
Subset selection problems (framework in [Bertsimas et al., 2016])
N = 500, P = 1 000, SNR = 8 dB, 50 instances
ap ∼ N(0, Σ), σij = ρ|i−j| y = Ax0 + ǫ, with equispaced non-zero values in x0 equal to 1 ρ = 0.8
ρ = 0.95
Exact recovery (%)
10 20 30 40 K 20 40 60 80 100 5 10 15 20
K 20 40 60 80 100
Sparsity degree Sparsity degree
⊲ Better estimation than standard methods (even with limited time) ⊲ Performance ≫ CPLEX
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 13 / 16
Subset selection: computing times
N = 500, P = 1 000, SNR = 8 dB, 50 instances
Problem Branch-and-Bound CPLEX Time Nodes T/N Failed Time Nodes T/N Failed (s) (×103) (s) (s) (×103) (s) K = 5 0.3 0.01 0.02 80.0 0.01 6.07 P2/0 K = 9 7.4 0.15 0.05 265.8 0.18 1.45 K = 13 54.6 0.69 0.08 6 786.0 0.84 0.93 46 K = 17 624.4 5.82 0.11 43
- 50
K = 5 0.2 0.01 0.02 1 495.7 0.32 1.54 11 ρ = 0.8 P0/2 K = 9 16 0.21 0.08
- 50
K = 13 311.5 2.32 0.13 12
- 50
K = 17 694.3 4.90 0.14 47
- 50
K = 5 0.3 0.03 0.01 235.3 0.04 6.00 P2+0 K = 9 14.3 0.95 0.02 549.1 1.29 0.42 4 K = 13 855.3 34.52 0.02 32 982.6 3.61 0.27 49 K = 17 996.9 42.49 0.02 41
- 50
⊲ CPLEX performance ↓↓ ⊲ Resolution guaranteed up to K ∼ 15
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 14 / 16
Conclusions
Exact ℓ0-norm solutions may be worth being computed Specific Branch-and-bound method ≫ generic MIP resolution
◮ Cardinality-constrained, error-constrained, penalized problems ◮ Efficiency depends on K, P, but also on noise level / correlation level in A ◮ Partial exploration competitive solutions (although not guaranteed)
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 15 / 16
Conclusions
Exact ℓ0-norm solutions may be worth being computed Specific Branch-and-bound method ≫ generic MIP resolution
◮ Cardinality-constrained, error-constrained, penalized problems ◮ Efficiency depends on K, P, but also on noise level / correlation level in A ◮ Partial exploration competitive solutions (although not guaranteed)
⊲ More sophisticated branching rules, local heuristics, cutting planes, . . .
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 15 / 16
Conclusions
Exact ℓ0-norm solutions may be worth being computed Specific Branch-and-bound method ≫ generic MIP resolution
◮ Cardinality-constrained, error-constrained, penalized problems ◮ Efficiency depends on K, P, but also on noise level / correlation level in A ◮ Partial exploration competitive solutions (although not guaranteed)
⊲ More sophisticated branching rules, local heuristics, cutting planes, . . . ⊲ Improve relaxations non-convex solutions!
|x| |x|p
CEL0(x)
- S. Bourguignon
ℓ0-norm optimization & branch-and-bound 15 / 16
Thank you. ANR JCJC program, 2016-2021, MIMOSA: Mixed Integer programming Methods for Optimization of Sparse Approximation criteria.
⊲ S. Bourguignon, J. Ninin, H. Carfantan and M. Mongeau. IEEE Transactions on Signal Processing, 2016. ⊲
- R. Ben Mhenni, S. Bourguignon and J. Ninin.
Optimization Methods and Software, submitted. ⊲ Matlab and C codes available upon request.
- S. Bourguignon