QP & cone program duality Support vector machines 10-725 - - PowerPoint PPT Presentation

qp cone program duality support vector machines
SMART_READER_LITE
LIVE PREVIEW

QP & cone program duality Support vector machines 10-725 - - PowerPoint PPT Presentation

QP & cone program duality Support vector machines 10-725 Optimization Geoff Gordon Ryan Tibshirani Review Quadratic programs Cone programs SOCP , SDP QP SOCP SDP SOC, S + are self-dual Poly-time algos (but


slide-1
SLIDE 1

QP & cone program duality Support vector machines

10-725 Optimization Geoff Gordon Ryan Tibshirani

slide-2
SLIDE 2

Geoff Gordon—10-725 Optimization—Fall 2012

Review

  • Quadratic programs
  • Cone programs
  • SOCP

, SDP

  • QP ⊆ SOCP ⊆ SDP
  • SOC, S+ are self-dual
  • Poly-time algos (but not strongly poly-time, yet)
  • Examples: group lasso, Huber regression, matrix

completion

2

slide-3
SLIDE 3

Geoff Gordon—10-725 Optimization—Fall 2012

Matrix completion

  • Observe Aij for ij ∈ E, write Oij = {
  • min ||(X–A)○P||2 + λ||X||

3

F *

Geoff Gordon—10-725 Optimization—Fall 2012

  • 23

*

X

  • = {
slide-4
SLIDE 4

Geoff Gordon—10-725 Optimization—Fall 2012

Max-variance unfolding

aka semidefinite embedding

  • Goal: given x1, … xT ∈ Rn
  • find y1, …, yT ∈ Rk (k ≪ n)
  • ||yi – yj|| ≈ ||xi – xj|| ∀i,j ∈ E
  • If xi were near a k-dim

subspace of Rn, PCA!

  • Instead, two steps:
  • first look for z1, … zT ∈ Rn with
  • ||zi – zj|| = ||xi – xj|| ∀i,j ∈ E
  • and var(z) as big as possible
  • then use PCA to get yi from zi

4

1 0.5 0.5 1 1.5 1 0.5 0.5 1

slide-5
SLIDE 5

Geoff Gordon—10-725 Optimization—Fall 2012

MVU/SDE

  • maxz tr(cov(z)) s.t. ||zi – zj|| = ||xi – xj|| ∀i,j ∈ E

5

slide-6
SLIDE 6

Geoff Gordon—10-725 Optimization—Fall 2012

Result

  • Embed 400 images of a teapot into 2d

6

B A query

Euclidean distance from query to A is smaller; after MVU, distance to B is smaller

[Weinberger & Saul, AAAI, 2006]

slide-7
SLIDE 7

Geoff Gordon—10-725 Optimization—Fall 2012

Duality for QPs and Cone Ps

  • Combined QP/CP:
  • min cTx + xTHx/2 s.t. Ax + b ∈ K x ∈ L
  • cones K, L implement any/all of equality, inequality,

generalized inequality

  • assume K, L proper (closed, convex, solid, pointed)

7

slide-8
SLIDE 8

Geoff Gordon—10-725 Optimization—Fall 2012

Primal-dual pair

  • Primal:
  • min cTx + xTHx/2 s.t. Ax + b ∈ K x ∈ L
  • Dual:
  • max –zTHz/2 – bTy s.t. Hz + c – ATy ∈ L* y ∈ K*

8

slide-9
SLIDE 9

Geoff Gordon—10-725 Optimization—Fall 2012

KKT conditions

  • min cTx + xTHx/2 s.t. Ax + b ∈ K x ∈ L
  • max –bTy – zTHz/2 s.t. Hz + c – ATy ∈ L* y ∈ K*

9

primal- dual pair

slide-10
SLIDE 10

Geoff Gordon—10-725 Optimization—Fall 2012

KKT conditions

  • primal: Ax+b ∈ K x ∈ L
  • dual: Hz + c – ATy ∈ L* y ∈ K*
  • quadratic: Hx = Hz
  • comp. slack: yT(Ax+b) = 0 xT(Hz+c–ATy) = 0

10

slide-11
SLIDE 11

Geoff Gordon—10-725 Optimization—Fall 2012

Support vector machines

(separable case)

slide-12
SLIDE 12

Geoff Gordon—10-725 Optimization—Fall 2012

Maximizing margin

  • margin M = yi (xi . w - b)
  • max M s.t. M ≤ yi (xi . w - b)
slide-13
SLIDE 13

Geoff Gordon—10-725 Optimization—Fall 2012

For example

13

1 1 2 3 1 0.5 0.5 1 1.5 2 2.5

slide-14
SLIDE 14

Geoff Gordon—10-725 Optimization—Fall 2012

Slacks

  • min ||v||2/2 s.t. yi (xiTv – d) ≥ 1 ∀i

14

1 1 2 3 1 0.5 0.5 1 1.5 2 2.5

slide-15
SLIDE 15

Geoff Gordon—10-725 Optimization—Fall 2012

SVM duality

  • min ||v||2/2 – Σsi s.t. yi (xiTv – d) ≥ 1–si ∀i
  • min vTv/2 + 1Ts s.t. Av – yd + s – 1 ≥ 0

15

slide-16
SLIDE 16

Geoff Gordon—10-725 Optimization—Fall 2012

Interpreting the dual

  • max 1Tα – αTKα/2 s.t. yTα = 0 0 ≤ α ≤ 1

16

!! !"#$ " "#$ ! !#$ % %#$ & !! !"#$ " "#$ ! !#$ % %#$

α: α>0: α<1: yTα=0:

slide-17
SLIDE 17

Geoff Gordon—10-725 Optimization—Fall 2012

From dual to primal

  • max 1Tα – αTKα/2 s.t. yTα = 0 0 ≤ α ≤ 1

17

!! !"#$ " "#$ ! !#$ % %#$ & !! !"#$ " "#$ ! !#$ % %#$

slide-18
SLIDE 18

Geoff Gordon—10-725 Optimization—Fall 2012

A suboptimal support set

18

1 1 2 1 0.5 0.5 1 1.5 2 2.5

slide-19
SLIDE 19

Geoff Gordon—10-725 Optimization—Fall 2012

SVM duality: the applet

slide-20
SLIDE 20

Geoff Gordon—10-725 Optimization—Fall 2012

Why is the dual useful?

aka the kernel trick

  • SVM: n examples, m features
  • primal:
  • dual:

20

max 1Tα – αTAATα/2 s.t. yTα = 0 0 ≤ α ≤ 1

slide-21
SLIDE 21

Interior-point methods

10-725 Optimization Geoff Gordon Ryan Tibshirani

slide-22
SLIDE 22

Geoff Gordon—10-725 Optimization—Fall 2012

Ball center

aka Chebyshev center

  • X = { x | Ax + b ≥ 0 }
  • Ball center:
  • if ||ai|| = 1
  • in general:

22

slide-23
SLIDE 23

Geoff Gordon—10-725 Optimization—Fall 2012

Analytic center

  • Let s = Ax + b
  • Analytic center:
  • 23
slide-24
SLIDE 24

Geoff Gordon—10-725 Optimization—Fall 2012

Bad conditioning? No problem.

24

slide-25
SLIDE 25

Geoff Gordon—10-725 Optimization—Fall 2012

Newton for analytic center

  • Lagrangian L(x,s,y) = –∑ ln si + yT(s–Ax–b)

25

slide-26
SLIDE 26

Geoff Gordon—10-725 Optimization—Fall 2012

Adding an objective

  • Analytic center was for { x | Ax + b = s ≥ 0 }
  • Now: min cTx st Ax + b = s ≥ 0
  • Same trick:
  • min t cTx – ∑ ln si st Ax + b = s ≥ 0
  • parameter t ≥ 0
  • central path =
  • t → 0: t → ∞:
  • L(x,s,y) =

26

slide-27
SLIDE 27

Geoff Gordon—10-725 Optimization—Fall 2012

Newton for central path

  • L(x,s,y) = t cTx – ∑ ln si + yT(s–Ax–b)

27