Convex Optimization 5. Duality Prof. Ying Cui Department of - - PowerPoint PPT Presentation

convex optimization
SMART_READER_LITE
LIVE PREVIEW

Convex Optimization 5. Duality Prof. Ying Cui Department of - - PowerPoint PPT Presentation

Convex Optimization 5. Duality Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2018 SJTU Ying Cui 1 / 46 Outline Lagrange dual function Lagrange dual problem Geometric interpretation Optimality


slide-1
SLIDE 1

Convex Optimization

  • 5. Duality
  • Prof. Ying Cui

Department of Electrical Engineering Shanghai Jiao Tong University

2018

SJTU Ying Cui 1 / 46

slide-2
SLIDE 2

Outline

Lagrange dual function Lagrange dual problem Geometric interpretation Optimality conditions Perturbation and sensitivity analysis Examples Generalized inequalities

SJTU Ying Cui 2 / 46

slide-3
SLIDE 3

Lagrangian

standard form problem (not necessarily convex) min

x

f0(x) s.t. fi(x) ≤ 0, i = 1, ..., m hi(x) = 0, i = 1, ..., p domain D = m

i=0 domfi ∩ p i=1 domhi and optimal value p∗

◮ basic idea in Lagrangian duality: take the constraints into account by augmenting the objective function with a weighted sum of the constraint functions Lagrangian: L : Rn × Rm × Rp → R, with dom L = D × Rm × Rp, L(x, λ, ν) = f0(x) +

m

  • i=1

λifi(x) +

p

  • i=1

νihi(x) ◮ weighted sum of objective and constraint functions ◮ λi is Lagrange multiplier associated with fi(x) ≤ 0 ◮ νi is Lagrange multiplier associated with hi(x) = 0

SJTU Ying Cui 3 / 46

slide-4
SLIDE 4

Lagrange dual function

Lagrange dual function (or dual function): g : Rm × Rp → R g(λ, ν) = inf

x∈D L(x, λ, ν) = inf x∈D

  • f0(x) +

m

  • i=1

λifi(x) +

p

  • i=1

νihi(x)

  • ◮ g is concave even when problem is not convex, as it is

pointwise infimum of a family of affine functions of (λ, ν)

◮ pointwise minimum or infimum of concave functions is concave

◮ g can be −∞ when L is unbounded below in x

SJTU Ying Cui 4 / 46

slide-5
SLIDE 5

Lower bound property

The dual function yields lower bounds on the optimal value of the primal problem, i.e., for any λ 0 and any ν, g(λ, ν) ≤ p∗ ◮ the inequality holds but is vacuous when g(λ, ν) = −∞ ◮ the dual function gives a nontrivial lower bound only when λ 0 and (λ, ν) ∈ domg, i.e., g(λ, ν) > −∞ ◮ refer to (λ, ν) with λ 0, (λ, ν) ∈ domg as dual feasible proof: Suppose ˜ x is feasible, i.e., fi(˜ x) ≤ 0 and hi(˜ x) = 0, and λ 0. Then, we have

m

  • i=1

λifi(˜ x) +

p

  • i=1

νihi(˜ x) ≤ 0 = ⇒ L(˜ x, λ, ν) ≤ f0(˜ x) Hence, g(λ, ν) = inf

x∈D L(x, λ, ν) ≤ L(˜

x, λ, ν) ≤ f0(˜ x) Minimizing over all feasible ˜ x gives p∗ ≥ g(λ, ν).

SJTU Ying Cui 5 / 46

slide-6
SLIDE 6

Examples

Least-norm solution of linear equations min

x

xTx s.t. Ax = b dual function: ◮ to minimize L(x, ν) = xTx + νT(Ax − b) over x (unconstrained convex problem), set gradient equal to zero: ∇xL(x, ν) = 2x + ATν = 0 ⇒ x = −(1/2)AT ν ◮ plug in L(x, ν) to obtain g: g(ν) = L((−1/2)AT ν, ν) = (−1/4)νT AATν − bTν which is a concave quadratic function of ν, as −AAT 0 lower bound property: p∗ ≥ (−1/4)νT AATν − bTν, for all ν

SJTU Ying Cui 6 / 46

slide-7
SLIDE 7

Examples

Standard form LP min

x

cTx s.t. Ax = b, x 0 dual function: ◮ Lagrangian L(x, λ, ν) = cTx + νT(Ax − b) − λTx = −bTν + (c + ATν − λ)Tx is affine in x (bounded below only when identically zero) ◮ dual function g(λ, ν) = inf

x L(x, λ, ν) =

  • −bTν,

ATν − λ + c = 0 −∞,

  • therwise

lower bound property: nontrivial only when λ 0 and ATν − λ + c = 0, and hence p∗ ≥ −bTν if ATν + c 0

SJTU Ying Cui 7 / 46

slide-8
SLIDE 8

Examples

Two-way partitioning problem (W ∈ Sn) min

x

xTWx s.t. x2

i = 1,

i = 1, ..., n ◮ a nonconvex problem with 2n discrete feasible points ◮ find the two-way partition of {1, ..., n} with least total cost

◮ Wij is cost of assigning i, j to the same set ◮ −Wij is cost of assigning i, j to different sets

dual function: g(ν) = inf

x (xT Wx +

  • i

νi(x2

i − 1))

= inf

x xT(W + diag(ν))x − 1Tν =

  • −1Tν,

W + diag(ν) 0 −∞,

  • therwise

lower bound property: p∗ ≥ −1T ν if W + diag(ν) 0 example: ν = −λmin(W )1 gives bound p∗ ≥ nλmin(W )

SJTU Ying Cui 8 / 46

slide-9
SLIDE 9

Lagrange dual function and conjugate function

◮ conjugate f ∗ of a function f : Rn → R: f ∗(y) = sup

x∈dom f

(y Tx − f (x)) ◮ dual function of min

x

f0(x) s.t. x = 0 g(ν) = inf

x (f (x) + νTx) = − sup x ((−ν)T x − f (x))

◮ relationship: g(ν) = −f ∗(−ν)

◮ conjugate of any function is convex ◮ dual function of any problem is concave

SJTU Ying Cui 9 / 46

slide-10
SLIDE 10

Lagrange dual function and conjugate function

more generally (and more usefully), consider an optimization problem with linear inequality and equality constraints min

x

f0(x) s.t. Ax b, Cx = d dual function: g(λ, ν) = inf

x∈dom f0

  • f0(x) + λT(Ax − b) + νT(Cx − d)
  • =

inf

x∈dom f0

  • f0(x) + (ATλ + C Tν)T x
  • − bTλ − dTν

= −f ∗

0 (−ATλ − C Tν) − bTλ − dTν

domain of g follows from domain of f ∗

0 :

domg = {(λ, µ)| − ATλ − C Tν ∈ domf ∗

0 }

◮ simplify derivation of dual function if conjugate of f0 is known

SJTU Ying Cui 10 / 46

slide-11
SLIDE 11

Examples

Equality constrained norm minimization min

x

x s.t. Ax = b dual function: g(ν) = −bTν − f ∗

0 (−ATν) =

  • −bTν,

||ATν||∗ ≤ 1 −∞,

  • therwise

◮ conjugate of f0 = || · ||: f ∗

0 (y) =

  • 0,

||y||∗ ≤ 1 ∞,

  • therwise

i.e., the indicator function of the dual norm unit ball, where y∗ = supu≤1 uTy is dual norm of ·

SJTU Ying Cui 11 / 46

slide-12
SLIDE 12

Lagrange dual problem

max

λ,ν

g(λ, ν) s.t. λ 0 ◮ find best lower bound on p∗, obtained from Lagrange dual function ◮ always a convex optimization problem (maximize a concave function over a convex set), regardless of convexity of primal problem, optimal value denoted by d∗ ◮ λ, ν are dual feasible if λ 0 and g(λ, ν) > −∞ (i.e., (λ, ν) ∈ domg = {(λ, ν)|g(λ, ν) > −∞}) ◮ can often be simplified by making implicit constraint (λ, ν) ∈ dom g explicit, e.g.,

◮ standard form LP and its dual min

x

cTx max

ν

− bTν s.t. Ax = b, x 0 s.t. AT ν + c 0

SJTU Ying Cui 12 / 46

slide-13
SLIDE 13

Weak duality and strong duality

weak duality: d∗ ≤ p∗ ◮ always holds (for convex and nonconvex problems) ◮ can be used to find nontrivial lower bounds for difficult problems, e.g.,

◮ solving the SDP max

ν

− 1Tν s.t. W + diag(ν) 0 gives a lower bound for the two-way partitioning problem

strong duality: d∗ = p∗ ◮ does not hold in general ◮ (usually) holds for convex problems ◮ conditions that guarantee strong duality in convex problems are called constraint qualifications

◮ there exist many types of constraint qualifications

SJTU Ying Cui 13 / 46

slide-14
SLIDE 14

Slater’s constraint qualification

One simple constraint qualification is Slater’s condition (Slater’s constraint qualification): convex problem is strictly feasible, i.e., there exists an x ∈ intD such that fi(x) < 0, i = 1, · · · , m, Ax = b ◮ can be refined, e.g.,

◮ can replace intD with relintD (interior relative to affine hull) ◮ affine inequalities do not need to hold with strict inequality ◮ reduce to feasibility when the constraints are all affine equalities and inequalities

◮ implies strong duality for convex problems ◮ implies that the dual value is attained when d∗ > −∞, i.e., there exists a dual feasible (λ∗, ν∗) with g(λ∗, ν∗) = d∗ = p∗

SJTU Ying Cui 14 / 46

slide-15
SLIDE 15

Examples

Inequality form LP primal problem: min

x

cTx s.t. Ax b dual function: g(λ) = infx

  • (c + ATλ)T x − bTλ
  • =
  • −bTλ,

ATλ + c = 0 −∞,

  • therwise

dual problem: max

λ

− bTλ s.t. ATλ + c = 0, λ 0 ◮ from weaker form of Slater’s condition: strong duality holds for any LP provided the primal problem is feasible, implying strong duality holds for LPs if the dual is feasible ◮ in fact, p∗ = d∗ except when primal and dual are infeasible

SJTU Ying Cui 15 / 46

slide-16
SLIDE 16

Examples

Quadratic program: P ∈ Sn

++

min

x

xTPx s.t. Ax b dual function: g(λ) = infx(xT Px + λT(Ax − b)) = − 1

4λT AP−1ATλ − bTλ

dual problem: max

λ

− (1/4)λT AP−1ATλ − bTλ s.t. λ 0 ◮ from weaker form of Slater’s condition: strong duality holds provided the primal problem is feasible ◮ in fact, p∗ = d∗ always holds

SJTU Ying Cui 16 / 46

slide-17
SLIDE 17

Examples

A nonconvex problem with strong duality: A 0 min

x

xTAx + 2bTx s.t. xTx ≤ 1 dual function: g(λ) =infx(xT (A + λI)x + 2bT x − λ) =

  • −bT(A + λI)†b − λ,

A + λI 0, b ∈ R(A + λI) −∞,

  • therwise

dual problem and equivalent SDP: max

λ

− bT(A + λI)†b − λ max

λ,t − t − λ

s.t. A + λI 0, b ∈ R(A + λI) s.t. A + λI b bT t

  • ◮ strong duality holds although primal problem is nonconvex

(difficult to show)

SJTU Ying Cui 17 / 46

slide-18
SLIDE 18

Geometric interpretation

geometric interpretation via set of values ◮ set of values taken on by the constraint and objective functions: G = {(f1(x), · · · , fm(x), h1(x), · · · , hp(x), f0(x)) ∈ Rm × Rp × R|x ∈ D} ◮ optimal value: p∗ = inf{t|(u, v, t) ∈ G, u 0, v = 0} ◮ dual function: g(λ, ν) = inf{(λ, ν, 1)T (u, v, t)|(u, v, t) ∈ G}

◮ if the infimum is finite, then (λ, ν, 1)T(u, v, t) ≥ g(λ, ν) defines a nonvertical supporting hyperplane to G

◮ weak duality: for all λ 0, p∗ = inf{t|(u, v, t) ∈ G, u 0, v = 0} ≥ inf{(λ, ν, 1)T (u, v, t)|(u, v, t) ∈ G, u 0, v = 0} ≥ inf{(λ, ν, 1)T (u, v, t)|(u, v, t) ∈ G} =g(λ, ν)

SJTU Ying Cui 18 / 46

slide-19
SLIDE 19

Geometric interpretation

Example consider a simple problem with one constraint min

x

f0(x) p∗ = inf{t|(u, t) ∈ G, u ≤ 0} s.t. f1(x) ≤ 0 g(λ) = inf

(u,t)∈G(t + λu)

where G = {(f1(x), f0(x))|x ∈ D} ◮ λu + t = g(λ) is (non-vertical) supporting hyperplane to G ◮ hyperplane intersects t-axis at t = g(λ)

G p⋆ g(λ) λu + t = g(λ) t u Figure 5.3 Geometric interpretation of dual function and lower bound g(λ) ≤ p⋆, for a problem with one (inequality) constraint. Given λ, we minimize (λ, 1)T (u, t) over G = {(f1(x), f0(x)) | x ∈ D}. This yields a supporting hyperplane with slope −λ. The intersection of this hyperplane with the u = 0 axis gives g(λ). G p⋆ d⋆ λ1u + t = g(λ1) λ2u + t = g(λ2) λ⋆u + t = g(λ⋆) t u

Figure 5.4 Supporting hyperplanes corresponding to three dual feasible val- ues of λ, including the optimum λ⋆. Strong duality does not hold; the

  • ptimal duality gap p⋆ − d⋆ is positive.

SJTU Ying Cui 19 / 46

slide-20
SLIDE 20

Geometric interpretation

geometric interpretation via epigraph ◮ epigraph form of G: A = G + (Rm

+ × {0} × R+)

= {(u, v, t)|∃x ∈ D, fi(x) ≤ ui, i = 1, · · · , m, hi(x) = vi, i = 1, · · · , p, f0(x) ≤ t} includes all points with larger objective or inequality constraint function values ◮ optimal value: p∗ = inf{t|(0, 0, t) ∈ A} ◮ dual function: if λ 0, then g(λ, ν) = inf{(λ, ν, 1)T (u, v, t)|(u, v, t) ∈ A}

◮ if the infimum is finite, then (λ, ν, 1)T(u, v, t) ≥ g(λ, ν) defines a nonvertical supporting hyperplane to A

◮ weak duality: p∗ = (λ, ν, 1)T (0, 0, p∗) ≥ g(λ, ν) ◮ strong duality: holds iff there exists a nonvertical supporting hyperplane to A at its boundary point (0, 0, p∗)

◮ for convex problem, A is convex, hence has a supporting hyperplane at (0, 0, p∗) ◮ Slater’s condition guarantees the supporting hyperplane to be nonvertical

SJTU Ying Cui 20 / 46

slide-21
SLIDE 21

Geometric interpretation

Example consider a simple problem with one constraint min

x

f0(x) p∗ = inf{t|(0, t) ∈ A} s.t. f1(x) ≤ 0 g(λ) = inf{(λ, 1)T (u, t)|(u, t) ∈ A} where A = {(u, t)|∃x ∈ D, f1(x) ≤ u, f0(x) ≤ t}

A (0, p⋆) (0, g(λ)) λu + t = g(λ) t u Figure 5.5 Geometric interpretation of dual function and lower bound g(λ) ≤ p⋆, for a problem with one (inequality) constraint. Given λ, we minimize (λ, 1)T (u, t) over A = {(u, t) | ∃x ∈ D, f0(x) ≤ t, f1(x) ≤ u}. This yields a supporting hyperplane with slope −λ. The intersection of this hyperplane with the u = 0 axis gives g(λ).

SJTU Ying Cui 21 / 46

slide-22
SLIDE 22

Certificate of suboptimality and stopping criteria

do not assume the primal problem is convex, and let x and (λ, ν) be a primal feasible point and a dual feasible point, respectively ◮ (λ, ν) provides a proof or certificate that p∗ ≥ g(λ, ν) ◮ (λ, ν) bounds how suboptimal x is without knowing p∗: f0(x) − p∗ ≤ f0(x) − g(λ, ν)

◮ provide nonheuristic stopping criteria in optimization algs

◮ x, (λ, ν) localizes p∗, d∗ to an interval: p∗, d∗ ∈ [g(λ, ν), f0(x)] with the width being the duality gap f0(x) − g(λ, ν) associated with x and (λ, ν) ◮ if f0(x) = g(λ, ν), then x is primal optimal and (λ, ν) is dual

  • ptimal

◮ (λ, ν) is a certificate that proves x is optimal ◮ x is a certificate that proves (λ, ν) is dual optimal

SJTU Ying Cui 22 / 46

slide-23
SLIDE 23

Complementary slackness

Let x∗ and (λ∗, ν∗) be any primal optimal and dual optimal points. Assume strong duality holds. Then, f0(x∗) = g(λ∗, ν∗) = inf

x

  • f0(x) +

m

  • i=1

λ∗

i fi(x) + p

  • i=1

ν∗

i hi(x)

  • ≤ f0(x∗) +

m

  • i=1

λ∗

i fi(x∗) + p

  • i=1

ν∗

i hi(x∗)

≤ f0(x∗) Hence, the two inequalities hold with equality implying: ◮ x∗ minimizes L(x, λ∗, ν∗) over x (L(x, λ∗, ν∗) can have other minimizers) ◮ complementary slackness: λ∗

i fi(x∗) = 0, i = 1, · · · , m, i.e.,

λ∗

i > 0 ⇒ fi(x∗) = 0,

fi(x∗) < 0 ⇒ λ∗

i = 0

λ∗

i = 0 unless the ith constraint is active at the optimum

SJTU Ying Cui 23 / 46

slide-24
SLIDE 24

Karush-Kuhn-Tucker (KKT) conditions

Consider any optimization problem with differentiable objective and constraint functions. The following four conditions are called KKT conditions: ◮ primal constraints: fi(x) ≤ 0, i = 1, · · · , m, hi(x) = 0, i = 1, · · · , p ◮ dual constraints: λ 0 ◮ complementary slackness: λifi(x) = 0, i = 1, · · · , m ◮ gradient of L(x, λ, ν) with respect to x vanishes: ∇f0(x) +

m

  • i=1

λi∇fi(x) +

p

  • i=1

νi∇hi(x) = 0

SJTU Ying Cui 24 / 46

slide-25
SLIDE 25

Karush-Kuhn-Tucker (KKT) conditions

consider any optimization problem with differentiable objective and constraint functions KKT conditions for nonconvex/convex problems ◮ for any optimization problem, if strong duality holds, any pair

  • f primal and dual optimal points x∗, (λ∗, ν∗) must satisfy the

KKT conditions

◮ proof: The first and second conditions hold obviously. The third condition is shown on page 23. The fourth condition follows from the fact that x∗ = arg minx L(x, λ∗, ν∗) (shown

  • n page 23) and L(x, λ∗, ν∗) is differentiable.

SJTU Ying Cui 25 / 46

slide-26
SLIDE 26

Karush-Kuhn-Tucker (KKT) conditions

KKT conditions for convex problems ◮ for any convex optimization problem, any points ˜ x and (˜ λ, ˜ ν) that satisfy the KKT conditions are primal and dual optimal, and have zero duality gap

◮ proof: The first and second conditions state that ˜ x and (˜ λ, ˜ ν) are primary and dual feasible, respectively. By noting that L(x, ˜ λ, ˜ ν) is convex in x (as ˜ λ 0), the fourth condition implies that ˜ x = arg minx L(x, ˜ λ, ˜ ν), i.e., g(˜ λ, ˜ ν) = L(˜ x, ˜ λ, ˜ ν) = f0(˜ x) + m

i=1 ˜

λifi(˜ x) + p

i=1 ˜

νihi(˜ x) = f0(˜ x), where the last equality is due to the first and third conditions. g(˜ λ, ˜ ν) = f0(˜ x) means zero duality gap, implying that ˜ x and (˜ λ, ˜ ν) are primal and dual optimal.

◮ if a convex optimization problem satisfies Slater’s condition, then the KKT conditions provide necessary and sufficient conditions for optimality, i.e.,

◮ x is optimal iff there are (λ, ν) that, together with x, satisfy the KKT conditions

SJTU Ying Cui 26 / 46

slide-27
SLIDE 27

Karush-Kuhn-Tucker (KKT) conditions

KKT conditions play an important role in optimization ◮ in a few special cases, it is possible to solve the KKT conditions (and therefore, the optimization problem) analytically ◮ more generally, many algorithms for convex optimization are conceived as, or can be interpreted as, methods for solving the KKT conditions

SJTU Ying Cui 27 / 46

slide-28
SLIDE 28

Example

water-filling (assume αi > 0) min

x

n

  • i=1

log(xi + αi) s.t. x 0, 1Tx = 1 x is optimal iff x 0, 1Tx = 1, and there exist λ ∈ Rn, ν ∈ R such that λ 0, λixi = 0, 1 xi + αi + λi = ν ◮ if ν < 1/αi: λi = 0 and xi = 1/ν − αi ◮ if ν ≥ 1/αi: λi = ν − 1/αi and xi = 0 ◮ determine ν from 1Tx = n

i=1 max{0, 1/ν − αi} = 1

thus, the optimal point is given by x∗

i = max{0, 1/ν∗ − αi}

where ν∗ satisfies n

i=1 max{0, 1/ν∗ − αi} = 1

SJTU Ying Cui 28 / 46

slide-29
SLIDE 29

Example

interpretation: ◮ n patches; level of patch i is at height ai ◮ flood area with unit amount of water ◮ resulting level is 1/ν∗ ◮ depth of water above patch i is x∗

i

i 1/ν⋆ xi αi Figure 5.7 Illustration of water-filling algorithm. The height of each patch is given by αi. The region is flooded to a level 1/ν⋆ which uses a total quantity

  • f water equal to one. The height of the water (shown shaded) above each

patch is the optimal value of x⋆

i .

SJTU Ying Cui 29 / 46

slide-30
SLIDE 30

Perturbation and sensitivity analysis

(unperturbed) optimization problem and its dual min

x

f0(x) max

λ,ν

g(λ, ν) s.t. fi(x) ≤ 0, i = 1, ..., m s.t. λ 0 hi(x) = 0, i = 1, ..., p perturbed problem and its dual min

x

f0(x) max

λ,ν

g(λ, ν) − uTλ − v Tν s.t. fi(x) ≤ ui, i = 1, ..., m s.t. λ 0 hi(x) = vi, i = 1, ..., p

SJTU Ying Cui 30 / 46

slide-31
SLIDE 31

Perturbation and sensitivity analysis

◮ x is primal variable, and u, v are parameters

◮ tighten (ui < 0) or relax (ui > 0) ith inequality constraint by ui ◮ change the righthand side of ith equality constraints by vi

◮ p∗(u, v) is optimal value of perturbed problem, as a function

  • f perturbations to the righthand sides of the constraints

◮ p∗(0, 0) = p∗ ◮ when p∗(u, v) = ∞, perturbations of the constraints result in infeasibility ◮ when unperturbed problem is convex, p∗(u, v) is a convex function of u and v

◮ interested in information about p∗(u, v) obtained from solution of unperturbed problem and its dual

SJTU Ying Cui 31 / 46

slide-32
SLIDE 32

Perturbation and sensitivity analysis

global sensitivity Assume that strong duality holds for unperturbed problem, and that the dual optimum is attained. Let λ∗, ν∗ be dual optimal for unperturbed problem. Then for all u and v, p∗(u, v) ≥ p∗(0, 0) − uTλ∗ − v Tν∗ global sensitivity interpretation: ◮ large λ∗

i : p∗ increases greatly if tightening constraint

i (ui < 0) ◮ small λ∗

i : p∗ does not decrease much if loosening constraint

i (ui > 0) ◮ large and positive ν∗

i : p∗ increases greatly if taking vi < 0;

large and negative ν∗

i : p∗ increases greatly if taking vi > 0

◮ small and positive ν∗

i : p∗ does not decrease much if taking

vi > 0; small and negative ν∗

i : p∗ does not decrease much if taking

vi < 0

SJTU Ying Cui 32 / 46

slide-33
SLIDE 33

Perturbation and sensitivity analysis

proof: apply weak duality to perturbed problem and then strong duality to the unperturbed problem p∗(u, v) ≥ g(λ∗, ν∗) − uTλ∗ − v Tν∗ = p∗(0, 0) − uTλ∗ − v Tν∗ example: p∗(u) for a problem with one inequality constraint:

u p⋆(u) p⋆(0) − λ⋆u u = 0 Figure 5.10 Optimal value p⋆(u) of a convex problem with one constraint f1(x) ≤ u, as a function of u. For u = 0, we have the original unperturbed problem; for u < 0 the constraint is tightened, and for u > 0 the constraint is loosened. The affine function p⋆(0) − λ⋆u is a lower bound on p⋆.

SJTU Ying Cui 33 / 46

slide-34
SLIDE 34

Perturbation and sensitivity analysis

local sensitivity If (in addition) p∗(u, v) is differentiable at (0, 0), then λ∗

i = −∂p∗(0, 0)

∂ui , ν∗

i = −∂p∗(0, 0)

∂vi local sensitivity interpretation: ◮ optimal Lagrange multipliers are exactly the local sensitivities

  • f the optimal value with respect to constraint perturbations

◮ tightening (loosening) ith inequality constraint a small amount yields an increase (a decrease) in p∗ of approximately −λ∗

i ui

(λ∗

i ui)

◮ local sensitivity result gives us a quantitative measure of how active a constraint is at the optimum x∗

◮ fi(x∗) < 0: constraint can be tightened or loosened a small amount without affecting the optimal value, as λ∗

i = 0

◮ fi(x∗) = 0: small (large) λ∗

i means that constraint can be

loosened or tightened a bit without much (with great) effect

  • n the optimal value

SJTU Ying Cui 34 / 46

slide-35
SLIDE 35

Perturbation and sensitivity analysis

proof (for λ∗

i ): choosing u = tei and v = 0, from global sensitivity

result, p∗(tei ,0)−p∗(0,0)

t

≥ −λ∗

i ,

t > 0

p∗(tei ,0)−p∗(0,0) t

≤ −λ∗

i ,

t < 0 = ⇒    lim

tց0 p∗(tei ,0)−p∗(0,0) t

≥ −λ∗

i

lim

tր0 p∗(tei ,0)−p∗(0,0) t

≤ −λ∗

i

Thus, ∂p∗(0,0)

∂ui

= −λ∗

i .

SJTU Ying Cui 35 / 46

slide-36
SLIDE 36

Duality and problem reformulations

◮ equivalent formulations of a problem can lead to very different dual problems ◮ reformulating the primal problem can be useful when the dual problem is difficult to derive, or uninteresting common reformulations ◮ introduce new variables and associated equality constraints ◮ replacing the objective with an increasing function of the

  • riginal objective

◮ make explicit constraints implicit (i.e., incorporating them into the domain of objective) or vice-versa

SJTU Ying Cui 36 / 46

slide-37
SLIDE 37

Introducing new variables and equality constraints

unconstrained problem: minx f0(Ax + b) ◮ dual function is constant: g = infx f0(Ax + b) = p∗ ◮ strong duality holds, i.e., p∗ = d∗, but dual is not useful reformulated problem and its dual: min

x,y

f0(y) max

ν

bTν − f ∗

0 (ν)

s.t. Ax + b − y = 0 s.t. ATν = 0 dual function follows from g(ν) = inf

x,y(f0(y) + νT(Ax + b − y)) = inf x,y(f0(y) − νTy + νTAx) + bTν

=

  • infy(f0(y) − νTy) + bTν,

ATν = 0 −∞,

  • therwise

=

  • −f ∗

0 (ν) + bTν,

ATν = 0 −∞,

  • therwise

SJTU Ying Cui 37 / 46

slide-38
SLIDE 38

Introducing new variables and equality constraints

minimum norm problem: minx ||Ax − b|| ◮ dual function is constant: g = infx ||Ax − b|| = p∗ ◮ strong duality holds, i.e., p∗ = d∗, but dual is not useful reformulated problem and its dual: min

x,y

||y|| max

ν

bTν s.t. y = Ax − b s.t. ATν = 0, ||ν||∗ ≤ 1 dual function follows from g(ν) = inf

x,y(||y|| + νT(y − Ax + b))

=

  • infy(||y|| + νTy) + bTν,

ATν = 0 −∞,

  • therwise

=

  • bTν,

ATν = 0, ||ν||∗ ≤ 1 −∞,

  • therwise

SJTU Ying Cui 38 / 46

slide-39
SLIDE 39

Transforming the objective

replacing the objective with an increasing function of the original

  • bjective

minimum norm problem: minx ||Ax − b|| reformulated problem and its dual: min

x,y

1/2||y||2 max

ν

− 1/2||ν||2

∗ + bTν

s.t. y = Ax − b s.t. ATν = 0 dual function follows from g(ν) = inf

x,y(1/2||y||2 + νT(y − Ax + b))

=

  • infy(1/2||y||2 + νTy) + bTν,

ATν = 0 −∞,

  • therwise

=

  • −1/2||ν||2

∗ + bTν,

ATν = 0 −∞,

  • therwise

last inequality: conjugate of 1/2|| · ||2 is 1/2|| · ||2

∗ (Ex.3.27, pp.93)

SJTU Ying Cui 39 / 46

slide-40
SLIDE 40

Implicit constraints

make explicit constraints implicit (i.e., incorporating them into the domain of objective) or vice-versa LP with box constraints: primal and dual problem min

x

cTx max

λ,ν

− bTν − 1T λ1 − 1Tλ2 s.t. Ax = b s.t. c + ATν + λ1 − λ2 = 0 − 1 x 1 λ1 0, λ2 0 reformulated problem with box constraints made implicit and its dual: max

x

f0(x) =

  • cTx,

−1 x 1 ∞,

  • therwise

max

ν

−bTν − ||AT ν + c||1 s.t. Ax = b dual function follows from: g(ν) = inf

−1x1(cTx + νT(Ax − b)) = −bTν − ||ATν + c||1

SJTU Ying Cui 40 / 46

slide-41
SLIDE 41

Problems with generalized inequality constraints

do not assume convexity of problem p∗ min

x

f0(x) s.t. fi(x) Ki 0, i = 1, ..., m hi(x) = 0, i = 1, ..., p Ki ⊆ Rki is a proper cone; Ki is a generalized inequality on Rki ◮ Lagrange multiplier vector associated with fi(x) Ki 0: λi ∈ Rki, Lagrange multiplier associated with hi(x) = 0: νi ∈ R ◮ Lagrangian L :Rn × Rk1 × ... × Rkm × Rp → R: L(x, λ1, ..., λm, ν) = f0(x) +

m

  • i=1

λT

i fi(x) + p

  • i=1

νihi(x) ◮ (concave) dual function g: Rk1 × ... × Rkm × Rp → R: g(λ1, ..., λm, ν) = inf

x∈D L(x, λ1, ..., λm, ν)

SJTU Ying Cui 41 / 46

slide-42
SLIDE 42

Lower bound property

The dual function yields lower bounds on the optimal value of the primal problem, i.e., for any λi K ∗

i 0 and any ν,

p∗ ≥ g(λ1, ..., λm, ν) proof: if ˜ x is feasible and λ K ∗

i 0, then

f0(˜ x) ≥ f0(˜ x) +

m

  • i=1

λT

i fi(˜

x) +

p

  • i=1

νihi(˜ x) ≥ inf

x∈D L(x, λ1, ..., λm, ν) = g(λ1, ..., λm, ν)

where the first inequality follows from the definition of the dual

  • cone. Minimizing over all feasible ˜

x gives p∗ ≥ g(λ1, ..., λm, ν). dual problem: d∗ max

λ1,...,λm,ν

g(λ1, ..., λm, ν) s.t. λi K ∗

i 0,

i = 1, ..., m

SJTU Ying Cui 42 / 46

slide-43
SLIDE 43

Weak duality and strong duality

weak duality: d∗ ≤ p∗ ◮ always holds (for convex and nonconvex problems) ◮ can be used to find nontrivial lower bounds for difficult problems strong duality: d∗ = p∗ ◮ does not hold in general ◮ holds for convex problem with constraint qualification, e.g.,

◮ Slater’s condition: primal problem is strictly feasible

SJTU Ying Cui 43 / 46

slide-44
SLIDE 44

Examples

Semidefinite program primal SDP (Fi, G ∈ Sk, positive semidefinite cone K1 = Sk

+):

min

x

cTx s.t. x1F1 + ... + xnFn G ◮ Lagrange multiplier is matrix Z ∈ Sk and Lagrangian is L(x, Z) =cTx + tr(Z(x1F1 + ... + xnFn − G)) =x1(c1 + tr(F1Z)) + · · · + xn(cn + tr(FnZ)) − tr(GZ) ◮ dual function g(Z) = inf

x L(x, Z) =

  • − tr(GZ)

tr(FiZ) + ci = 0, i = 1, ..., n − ∞

  • therwise

dual SDP: max

Z

− tr(GZ) s.t. Z 0, tr(FiZ) + ci = 0, i = 1, ..., n

SJTU Ying Cui 44 / 46

slide-45
SLIDE 45

Examples

Cone program in standard form primal CP: (proper cone K ⊆ Rn): min

x

cTx s.t. x K 0, Ax = b ◮ Lagrange multipliers λ ∈ Rn, ν ∈ Rm and Lagrangian is L(x, λ, ν) =cTx − λTx + νT(Ax − b) = (ATν − λ + c)T x − bTν ◮ dual function g(Z) = inf

x L(x, λ, ν) =

  • − bTν,

ATν − λ + c = 0 − ∞,

  • therwise

dual SDP: max

ν

− bTν s.t. ATν + c K ∗ 0

SJTU Ying Cui 45 / 46

slide-46
SLIDE 46

KKT conditions

differentiable fi, hi ◮ primal constraints: fi(x) Ki 0, i = 1, · · · , m, hi(x) = 0, i = 1, · · · , p ◮ dual constraints: λ K ∗

i 0

◮ complementary slackness: λT

i fi(x) = 0, i = 1, · · · , m, implying

λi ≻K ∗

i 0 ⇒ fi(x) = 0,

fi(x) ≺Ki 0 ⇒ λi = 0 ◮ gradient of Lagrangian with respect to x vanishes: ∇f0(x) +

m

  • i=1

λT

i ∇fi(x) + p

  • i=1

νi∇hi(x) = 0 KKT conditions for nonconvex/convex problems if strong duality holds, any primal optimal and any dual optimal must satisfy the KKT conditions KKT conditions for convex problems if strong duality holds, the KKT conditions provide necessary and sufficient conditions for optimality

SJTU Ying Cui 46 / 46