A k -norm-based Mixed Integer Programming formulation for sparse - - PowerPoint PPT Presentation

a k norm based mixed integer programming formulation for
SMART_READER_LITE
LIVE PREVIEW

A k -norm-based Mixed Integer Programming formulation for sparse - - PowerPoint PPT Presentation

A k -norm-based Mixed Integer Programming formulation for sparse optimization M. Gaudioso, * G. Giallombardo, * G. Miglionico. * DIMES-Universit a della Calabria, Rende (CS), Italia GdR MIA Thematic day on Non-Convex Sparse Optimization


slide-1
SLIDE 1

A k-norm-based Mixed Integer Programming formulation for sparse optimization

  • M. Gaudioso,*
  • G. Giallombardo,*
  • G. Miglionico.*

∗DIMES-Universit´

a della Calabria, Rende (CS), Italia

GdR MIA Thematic day on Non-Convex Sparse Optimization Friday October 9th 2020

slide-2
SLIDE 2

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

The issues

ℓ0 pseudo-norm and the k-norm; A k-norm-based discrete formulation of the sparse

  • ptimization problem;

Continuous relaxation; Application to Classification.

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-3
SLIDE 3

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Outline

1

Sparse Optimization and polyhedral k-norm

2

Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem

3

SVM classification, Feature Selection and Sparse Optimization

4

Numerical experiments

5

Bibliography

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-4
SLIDE 4

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Outline

1

Sparse Optimization and polyhedral k-norm

2

Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem

3

SVM classification, Feature Selection and Sparse Optimization

4

Numerical experiments

5

Bibliography

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-5
SLIDE 5

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Sparse optimization

The sparse optimization problem: f∗

0 = min x∈I Rn f(x) + x0

(P0), f : I Rn → R convex and not necessarily differentiable, n ≥ 2. The ℓ0 pseudo-norm .0 counts the number of non-zero components.

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-6
SLIDE 6

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

A class of polyhedral norms

The k-norm of x, (x[k]) is the sum of k maximal components (in modulus) of x, k = 1, . . . , n. The following hold: i) x∞ = x[1] ≤ . . . ≤ x[k] ≤ . . . x[n] = x1; ii) x0 ≤ k ⇒ x1 − x[s] = 0, k ≤ s ≤ n. In particular it is, 1 ≤ k ≤ n, x0 ≤ k ⇔ x1 − x[k] = 0. Property above allows us to replace the cardinality constraint x0 ≤ k with a constraint on the difference of norms (a Difference of Convex, DC,

  • ne).

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-7
SLIDE 7

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Differential properties of the k-norm

Let ¯ x ∈ I Rn and I[k]

= {i1, . . . , ik} be the index set of k maximal components (in modulus) of ¯

  • x. A subgradient ¯

g[k] of .0 at ¯ x is: ¯ g[k]

i

=        1 if i ∈ I[k] and ¯ xi ≥ 0 −1 if i ∈ I[k] and ¯ xi < 0

  • therwise

It holds: ¯ x[k] = max

y∈ψk y⊤¯

x, where ψk is the subdifferential of .[k] at point 0, ψk = {y ∈ I Rn | y = u − v, 0 ≤ u, v ≤ e, (u + v)⊤e = k}, and e is a vector of n “ones”.

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-8
SLIDE 8

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Outline

1

Sparse Optimization and polyhedral k-norm

2

Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem

3

SVM classification, Feature Selection and Sparse Optimization

4

Numerical experiments

5

Bibliography

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-9
SLIDE 9

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Standard formulation

Introduction of a set of binary variables zk, k = 1, . . . , n, as “flags” of the non-zero components of x. f ∗

I = min x,z f(x) + n

  • k=1

zk − Mzk ≤ xk ≤ Mzk, , k = 1, . . . , n zk ∈ {0, 1}, k = 1, . . . , n, M is the classic “big M” parameter. At the optimum it is xk = 0 ⇔ zk = 1, hence

n

  • k=1

zk is exactly the ℓ0 pseudo-norm of x.

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-10
SLIDE 10

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Continuous relaxation of the standard formulation

Replace the binary constraint zk ∈ {0, 1} with 0 ≤ zk ≤ 1, k = 1, . . . , n. At the optimum at least one of the couple of constraints: −Mzk ≤ xk ≤ Mzk, k = 1, . . . , n, is satisfied by equality and it is zk = |xk| M . Thus we come out with

n

  • k=1

zk = 1 M x1. The objective function of the continuous relaxation is finally: F(x) = f(x) + 1 M x1, which coincides with classic ℓ1 normalization of f (LASSO approach).

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-11
SLIDE 11

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

The k-norm formulation

Rewrite the equivalence: x0 ≤ k ⇔ x1 − x[k] = 0, as: x0 > k ⇔ x1 − x[s] > 0. Introduce the binary variables yk, k = 1, . . . , n and define f ∗

I = min x,y f(x) + n

  • k=1

yk x1 − x[k] ≤ M ′yk, , k = 1, . . . , n yk ∈ {0, 1}, k = 1, . . . , n. At the optimum it is: x1 − x[k] = 0 ⇔ yk = 0.

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-12
SLIDE 12

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Properties

At the optimum it is yn = 0 and, as far as x = 0, it is

n

  • k=1

yk = max

  • s
  • x1 − x[s] > 0
  • ,

thus

n

  • k=1

yk = x0 − 1, Remark DC type constraints.

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-13
SLIDE 13

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Continuous relaxation of the k-norm formulation

Replace the binary constraint yk ∈ {0, 1} with 0 ≤ yk ≤ 1, k = 1, . . . , n. At the optimum the constraints x1 − x[k] ≤ M ′yk, , k = 1, . . . , n are satisfied by equality, then it is: yk = 1 M ′ (x1 − x[k]). and

n

  • k=1

yk = 1 M ′

  • nx1 −

n

  • k=1

x[k]

  • .

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-14
SLIDE 14

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Formulation

The relaxation is now min

x∈I Rn Φ(x),

(1) with Φ(x) = f(x) + σ M′

  • nx1 −

n

  • k=1

x[k]

  • .

Note that function Φ is DC, with k−norms embedded into.

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-15
SLIDE 15

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Remarks on the objective function of the relaxation

Letting |xj1| ≥ |xj2| ≥ . . . ≥ |xjn|, it is x[k] =

k

  • s=1

|xjs|, hence

n

  • k=1

x[k] = n|xj1| + (n − 1)|xj2| + . . . + |xjn|. Finally, taking into account x1 =

n

  • k=1

|xjk|, we obtain

n

  • k=1

yk = 1 M ′

  • nx1 −

n

  • k=1

x[k]

  • =

1 M ′

  • |xj2| + 2|xj3| + . . . + (n − 1)|xjn|
  • =

1 M ′

  • n
  • s=2

(s − 1)|xjs|

  • .

the smaller is the modulus, the bigger is its weight: towards reduction of the small components of x, which is in favour of minimizing the ℓ0 pseudo-norm.

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-16
SLIDE 16

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Comparison between the two relaxations-1

Constants M is for single vector components, M ′ for norms, thus we set M ′ = nM. The objective function Φ of becomes: Φ(x) = f(x) + σ M x1 − σ nM

n

  • k=1

x[k]. Function Φ is DC, that is the difference of two convex functions: Φ(x) = f1(x) − f2(x), with f1(x) = f(x) + σ M x1

  • F (x)

and f2(x) = σ nM

n

  • k=1

x[k].

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-17
SLIDE 17

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Comparison between the two relaxations-2

f1 is exactly the objective function F(x) of the standard formulation relaxation, thus it is F(x) − Φ(x) = f2(x) ≥ 0. Summing up, F(x) is convex and majorizes the (nonconvex) Φ(x), f2(x) being the nonnegative gap, whose value depends on the sum

  • f the k-norms.

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-18
SLIDE 18

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

The gap function

Proposition On any sphere B(0, ρ) = {x| x1 = ρ} the gap function f2(x) achieves its: i) global maxima at xj1 = ±ρ, xj2 = . . . xjn = 0 (2n maxima, ℓ0-norm= 1); ii) global minima at |xj1| = . . . = |xjn| = ρ n (2n minima, ℓ0-norm= n). The (subtractive) gap f2 is maximal when x0 = 1 and it is minimal when all components are nonzero and equal in modulus, that is x0 = n. ⇓ The model exhibits a stronger bias towards reduction of the ℓ0 pseudo-norm than the standard one.

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-19
SLIDE 19

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Outline

1

Sparse Optimization and polyhedral k-norm

2

Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem

3

SVM classification, Feature Selection and Sparse Optimization

4

Numerical experiments

5

Bibliography

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-20
SLIDE 20

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Binary Classification

Given two point sets in I Rn (the samples) A

= {a1, . . . , am1}, B

= {b1, . . . , bm2} find a separating hyperplane H(w, γ) = {x|x ∈ Rn, x⊤w = γ} such that: a⊤

i w ≤ γ − 1,

i = 1, . . . , m1 and b⊤

l w ≥ γ + 1,

l = 1, . . . , m2. The Error function: e(w, γ) = 0 ⇔ H(w, γ) is a (strictly) separating hyperplane: e(w, γ) =

m1

  • i=1

max{0, a⊤

i w − γ + 1} + m2

  • l=1

max{0, −b⊤

l w + γ + 1} Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-21
SLIDE 21

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

The SVM model

The Support Vector Machine (SVM) model: min

w,γ w + Ce(w, γ)

w is aimed at maximizing the separation margin and C > 0 is the trade–off parameter, the ℓ1 or ℓ2 norms are commonly used. The Sparse Optimization (use of the ℓ0 norm) is a way to pursue the Feature Selection.

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-22
SLIDE 22

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Outline

1

Sparse Optimization and polyhedral k-norm

2

Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem

3

SVM classification, Feature Selection and Sparse Optimization

4

Numerical experiments

5

Bibliography

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-23
SLIDE 23

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Benchmark datasets

Details of datasets

# Name #Samples n 1 Breast-Cancer 683 10 2 Diabetes 768 8 3 Heart 270 13 4 Ionosphere 351 34 5 Brain Tumor1 60 7129 6 Brain Tumor2 50 12625 7 DLBCL 77 7129 8 Leukemia/ALLAML 72 5327

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-24
SLIDE 24

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Numerical results

k-norm Relaxation (use of ℓ0 norm in SVM)

Name test(%) train(%) ft0(%) ft-2(%) ft-4 (%) ft-9 (%) cpu (s) Breast-Cancer 96.78 97.23 16.00 76.00 76.00 76.00 0.35 Diabetes 76.96 77.52 27.50 86.25 87.50 87.50 0.42 Heart 83.33 85.84 10.00 80.00 80.00 80.77 0.12 Ionosphere 86.05 93.28 31.76 56.47 56.47 56.47 0.20 Brain Tumor1 63.62 68.40 0.010 0.015 0.015 0.015 3.91 Brain Tumor2 80.67 97.57 0.036 0.051 0.051 0.051 11.96 DLBCL 92.50 100.00 0.050 0.085 0.085 0.085 9.86 Leukemia 93.81 100.00 0.071 0.090 0.090 0.090 6.17

Standard Relaxation (use of ℓ1 norm in SVM)

Name test(%) train(%) ft0(%) ft-2(%) ft-4 (%) ft-9 (%) cpu (s) Breast-Cancer 96.63 97.23 8.00 86.00 86.00 86.00 0.30 Diabetes 76.83 77.52 25.00 91.25 92.50 92.50 0.40 Heart 84.07 85.05 2.31 85.38 86.92 86.92 0.07 Ionosphere 87.49 93.32 28.53 69.71 70.88 70.88 0.12 Brain Tumor1 58.38 77.41 0.000 0.192 0.205 0.206 1.49 Brain Tumor2 82.33 96.02 0.000 0.181 0.188 0.188 2.08 DLBCL 96.25 99.71 0.000 0.397 0.411 0.411 2.71 Leukemia 95.42 99.69 0.000 0.593 0.629 0.629 1.74 Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-25
SLIDE 25

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

Outline

1

Sparse Optimization and polyhedral k-norm

2

Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem

3

SVM classification, Feature Selection and Sparse Optimization

4

Numerical experiments

5

Bibliography

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse

slide-26
SLIDE 26

Sparse Optimization and polyhedral k-norm Two Mixed Integer Programming (MIP) formulations for the Sparse Optimization problem SVM classification, Feature Selection and Sparse Optimization Numerical experiments Bibliography

References

J.Gotoh, A. Takeda, K.Tono, DC formulations and algorithms for sparse

  • ptimization problems, Mathematical Programming, Ser. B, 2018, 169(1), pp.

141–176. M.G., E. Gorgone, J.-B. Hiriart-Urruty, Feature selection in SVM via polyhedral k-norms, Optimization Letters, 2020, 14(1), pp. 19-36. M.G., G. Giallombardo, G.Miglionico, A. M. Bagirov, Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations, Journal of Global Optimization, 71, pp.37-55 (2018). M.G., G. Giallombardo, G.Miglionico, Sparse optimization via vector k-norm and DC programming with an application to feature selection for Support Vector Machines, submitted.

Manlio Gaudioso A k-norm-based Mixed Integer Programming formulation for sparse