Convex Programs COMPSCI 371D Machine Learning COMPSCI 371D - - PowerPoint PPT Presentation

convex programs
SMART_READER_LITE
LIVE PREVIEW

Convex Programs COMPSCI 371D Machine Learning COMPSCI 371D - - PowerPoint PPT Presentation

Convex Programs COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Convex Programs 1 / 16 Logistic Regression Support Vector Machines Support Vector Machines (SVMs) and Convex Programs SVMs are linear predictors in


slide-1
SLIDE 1

Convex Programs

COMPSCI 371D — Machine Learning

COMPSCI 371D — Machine Learning Convex Programs 1 / 16

slide-2
SLIDE 2

Logistic Regression → Support Vector Machines

Support Vector Machines (SVMs) and Convex Programs

  • SVMs are linear predictors in their original form
  • Defined for both regression and classification
  • Multi-class versions exist
  • We will cover only binary SVM classification
  • Why do we need another linear classifier?
  • We’ll need some new math: Convex Programs
  • Optimization of convex functions with affine constraints

COMPSCI 371D — Machine Learning Convex Programs 2 / 16

slide-3
SLIDE 3

Logistic Regression → Support Vector Machines

Outline

1 Logistic Regression → Support Vector Machines 2 Local Convex Minimization → Convex Programs 3 Shape of the Solution Set 4 The Karush-Kuhn-Tucker Conditions

COMPSCI 371D — Machine Learning Convex Programs 3 / 16

slide-4
SLIDE 4

Logistic Regression → Support Vector Machines

Logistic Regression → SVMs

  • A logistic-regression classifier places the decision boundary

somewhere (and approximately) between the two classes

  • Loss is never zero → Exact location of the boundary can be

determined by samples that are very distant from the boundary (even on the correct side of it)

  • SVMs place the boundary “exactly half-way” between the

two classes (with exceptions to allow for non linearly-separable classes)

  • Only samples close to the boundary matter:

These are the support vectors

  • A “kernel trick” allows going beyond linear classifiers
  • We only look at the binary case

COMPSCI 371D — Machine Learning Convex Programs 4 / 16

slide-5
SLIDE 5

Logistic Regression → Support Vector Machines

Roadmap for SVMs

  • SVM training minimizes a convex function with constraints
  • Convex: Unique minimum risk
  • Constraints: Define a convex program as minimizing a

convex function subject to affine constraints

  • Representer theorem: The SVM hyperplane normal vector

is a linear combination of a subset of training samples (xn, yn). The xn are the support vectors.

  • The proof of the representer theorem is based on a

characterization of the solutions of a convex program

  • Characterization for an unconstrained problem: ∇f(u) = 0
  • Characterization for a convex program:

The Karush-Kuhn-Tucker (KKT) conditions

  • The representer theorem leads to the kernel trick, through

which SVMs can be turned into nonlinear classifiers

  • Decision boundary is no longer necessarily a hyperplane

COMPSCI 371D — Machine Learning Convex Programs 5 / 16

slide-6
SLIDE 6

Logistic Regression → Support Vector Machines

Roadmap Summary

Convex program → SVM formulation KKT conditions → representer theorem → kernel trick

COMPSCI 371D — Machine Learning Convex Programs 6 / 16

slide-7
SLIDE 7

Local Convex Minimization → Convex Programs

Local Convex Minimization → Convex Programs

  • Convex function f(u) : Rm → R
  • f differentiable, with continuous first derivatives
  • Unconstrained minimization: u∗ ∈ arg minu∈R

m f(u)

  • Constrained minimization: u∗ ∈ arg minu∈C f(u)
  • C = {u ∈ Rm : Au + b ≥ 0}
  • f is a convex function
  • C is a convex set: If u, v ∈ C, then for t ∈ [0, 1]

tu + (1 − t)v ∈ C

  • The specific C is bounded by hyperplanes
  • This is a convex program

COMPSCI 371D — Machine Learning Convex Programs 7 / 16

slide-8
SLIDE 8

Local Convex Minimization → Convex Programs

Convex Program

u∗ ∈ arg min

u∈C f(u)

where C

def

= {u ∈ Rm : c(u) ≥ 0} .

  • f differentiable, with continuous gradient, and convex
  • k inequalities in C are affine:

c(u) = Au + b ≥ 0 .

COMPSCI 371D — Machine Learning Convex Programs 8 / 16

slide-9
SLIDE 9

Shape of the Solution Set

Shape of the Solution Set

  • Just as for the unconstrained problem:
  • There is one f ∗ but there can be multiple u∗

(a flat valley)

  • The set of solution points u∗ is convex
  • if f is strictly convex at u∗, then u∗ is the unique solution point

COMPSCI 371D — Machine Learning Convex Programs 9 / 16

slide-10
SLIDE 10

Shape of the Solution Set

Zero Gradient → KKT Conditions

  • For the unconstrained problem, the solution is characterized

by ∇f(u) = 0

  • Constraints can generate new minima and maxima
  • Example: f(u) = eu

1

u f(u)

1

u f(u)

1

u f(u)

  • What is the new characterization?
  • Karush-Kuhn-Tucker conditions, necessary and sufficient

COMPSCI 371D — Machine Learning Convex Programs 10 / 16

slide-11
SLIDE 11

The Karush-Kuhn-Tucker Conditions

Regular Points

∇f s u H + H −

COMPSCI 371D — Machine Learning Convex Programs 11 / 16

slide-12
SLIDE 12

The Karush-Kuhn-Tucker Conditions

Corner Points

C ∇f s u c1 c2 H + H − C ∇f u c1 c2 H + H −

COMPSCI 371D — Machine Learning Convex Programs 12 / 16

slide-13
SLIDE 13

The Karush-Kuhn-Tucker Conditions

The Convex Cone of the Constraint Gradients

∇f u c1 c2 H + H −

COMPSCI 371D — Machine Learning Convex Programs 13 / 16

slide-14
SLIDE 14

The Karush-Kuhn-Tucker Conditions

Inactive Constraints Do Not Matter

C ∇f u c1 c2 c3 H + H − C u c1 c2 v

COMPSCI 371D — Machine Learning Convex Programs 14 / 16

slide-15
SLIDE 15

The Karush-Kuhn-Tucker Conditions

Conic Combinations

∇f u c1 c2 H + H −

a1 a2 v n

1

n2

{v : v = α1a1 + α2a2 with α1, α2 ≥ 0}

COMPSCI 371D — Machine Learning Convex Programs 15 / 16

slide-16
SLIDE 16

The Karush-Kuhn-Tucker Conditions

The KKT Conditions

u ∈ C is a solution to a convex program iff there exist αi s.t. ∇f(u) =

  • i∈A(u)

αi∇ci(u) with αi ≥ 0 where A(u) = {i : ci(u) = 0} is the active set at u

∇f u c1 c2 H + H −

Convention:

i∈∅ = 0 (so condition also holds in interior of C)

COMPSCI 371D — Machine Learning Convex Programs 16 / 16