Applications (I) Lijun Zhang zlj@nju.edu.cn - - PowerPoint PPT Presentation

applications i
SMART_READER_LITE
LIVE PREVIEW

Applications (I) Lijun Zhang zlj@nju.edu.cn - - PowerPoint PPT Presentation

Applications (I) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Norm Approximation Basic Norm Approximation Penalty Function Approximation Approximation with Constraints Least-norm Problems Regularized


slide-1
SLIDE 1

Applications (I)

Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj

slide-2
SLIDE 2

Outline

 Norm Approximation

 Basic Norm Approximation  Penalty Function Approximation  Approximation with Constraints

 Least-norm Problems  Regularized Approximation  Classification

 Linear Discrimination  Support Vector Classifier  Logistic Regression

slide-3
SLIDE 3

Basic Norm Approximation

 Norm Approximation Problem

  • are problem data

is the variable

 is a norm on

  •  Approximation solution of

, in

 Residual  A Convex Problem

 , the optimal value is 0  , more interesting

min 𝐵𝑦 𝑐 𝑠 𝐵𝑦 𝑐

slide-4
SLIDE 4

Basic Norm Approximation

 Approximation Interpretation

  • are the columns of

 Approximate the vector by a linear combination  Regression problem

 𝑏, … , 𝑏 are regressors

 𝑦𝑏 ⋯ 𝑦𝑏 is the regression of 𝑐

𝐵𝑦 𝑦𝑏 ⋯ 𝑦𝑏

slide-5
SLIDE 5

Basic Norm Approximation

 Estimation Interpretation

 Consider a linear measurement model 

is a vector measurement

is a vector of parameters to be

estimated 

is some measurement error that is

unknown, but presumed to be small  Assume smaller values of are more plausible

𝑧 𝐵𝑦 𝑤 𝑦 argmin 𝐵𝑨 𝑧

slide-6
SLIDE 6

Basic Norm Approximation

 Geometric Interpretation

 Consider the subspace

, and

a point

  •  A projection of the point
  • nto the

subspace , in the norm  Parametrize an arbitrary element of as , we see that norm approximation is equivalent to projection

min 𝑣 𝑐

  • s. t.

𝑣 ∈ 𝒝

slide-7
SLIDE 7

Basic Norm Approximation

 Weighted Norm Approximation Problems

is called the weighting matrix

 A norm approximation problem with norm , and data  A norm approximation problem with data and , and the

  • weighted norm

𝑨 𝑋𝑨 min 𝑋𝐵𝑦 𝑐

slide-8
SLIDE 8

Basic Norm Approximation

 Least-Squares Approximation

 The minimization of a convex quadratic function  A point minimizes if and only if  Normal equations

𝑔 𝑦 𝑦𝐵𝐵𝑦 2𝑐𝐵𝑦 𝑐𝑐 𝛼𝑔 𝑦 2𝐵𝐵𝑦 2𝐵𝑐 0 𝐵𝐵𝑦 𝐵𝑐 min 𝐵𝑦 𝑐

𝑠

  • 𝑠
  • ⋯ 𝑠
slide-9
SLIDE 9

Basic Norm Approximation

 Chebyshev or Minimax Approximation

 Be cast as an LP with variables

and

 Sum of Absolute Residuals Approximation

 Be cast as an LP

with variables 𝑦 ∈ 𝐒 and 𝑢 ∈ 𝐒 min 𝐵𝑦 𝑐 𝑠

⋯ 𝑠

  • min

𝑢

  • s. t.

𝑢1 ≼ 𝐵𝑦 𝑐 ≼ 𝑢1 min 𝐵𝑦 𝑐 max 𝑠

, … , 𝑠

min 1𝑢

  • s. t.

𝑢 ≼ 𝐵𝑦 𝑐 ≼ 𝑢

slide-10
SLIDE 10

Outline

 Norm Approximation

 Basic Norm Approximation  Penalty Function Approximation  Approximation with Constraints

 Least-norm Problems  Regularized Approximation  Classification

 Linear Discrimination  Support Vector Classifier  Logistic Regression

slide-11
SLIDE 11
  • norm Approximation

 -norm approximation, for

 The equivalent problem with objective

 A separable and symmetric function of the residuals  Objective depends only on the amplitude distribution of the residuals

𝑠

⋯ 𝑠 /

𝑠

⋯ 𝑠

slide-12
SLIDE 12

Penalty Function Approximation

 The Problem

 is called the penalty function  is convex  is symmetric, nonnegative, and satisfies  A penalty function assesses a cost or penalty for each component of residual

min 𝜚 𝑠

⋯ 𝜚 𝑠

  • s. t.

𝑠 𝐵𝑦 𝑐

slide-13
SLIDE 13

Example

  • norm Approximation

 Quadratic penalty:

  •  Absolute value penalty:

 Deadzone-linear Penalty Function  The Log Barrier Penalty Function

𝜚 𝑣 𝑣 𝜚 𝑣 0 𝑣 𝑏 𝑣 𝑏 𝑣 𝑏 𝜚 𝑣 𝑏 log 1 𝑣/𝑏 𝑣 𝑏 ∞ 𝑣 𝑏

slide-14
SLIDE 14

Example

 Log barrier penalty function assesses an infinite penalty for residuals larger than 𝑏  Log barrier function is very close to the quadratic penalty for |𝑣/𝑏| 0.25

slide-15
SLIDE 15

Discussions

 Roughly speaking, is a measure of

  • ur dislike of a residual of value

 If is very small for small , it means we care very little if residuals have these values  If grows rapidly as becomes large, it means we have a strong dislike for large residuals  If becomes infinite outside some interval, it means that residuals outside the interval are unacceptable

slide-16
SLIDE 16

Discussions

  •  For small 𝑣 we have 𝜚 𝑣 ≫ 𝜚 𝑣 , so ℓ-norm

approximation puts relatively larger emphasis

  • n small residuals

 The optimal residual for the ℓ-norm approximation problem will tend to have more zero and very small residuals  For large 𝑣 we have 𝜚 𝑣 ≫ 𝜚 𝑣 , so ℓ-norm approximation puts less weight on large residuals  The ℓ-norm solution will tend to have relatively fewer large residuals

slide-17
SLIDE 17

Example

, b

slide-18
SLIDE 18

Observations of Penalty Functions

 The ℓ-norm penalty puts the most weight on small residuals and the least weight on large residuals.  The ℓ-norm penalty puts very small weight on small residuals, but strong weight on large residuals.  The deadzone-linear penalty function puts no weight on residuals smaller than 0.5, and relatively little weight on large residuals.  The log barrier penalty puts weight very much like the ℓ-norm penalty for small residuals, but puts very strong weight on residuals larger than around 0.8, and infinite weight on residuals larger than 1.

slide-19
SLIDE 19

Observations of Amplitude Distributions

 For the ℓ-optimal solution, many residuals are either zero or very small. The ℓ-optimal solution also has relatively more large residuals.  The ℓ-norm approximation has many modest residuals, and relatively few larger ones.  For the deadzone-linear penalty, we see that many residuals have the value 0.5, right at the edge of the ‘free’ zone, for which no penalty is assessed.  For the log barrier penalty, we see that no residuals have a magnitude larger than 1, but

  • therwise the residual distribution is similar to the

residual distribution for ℓ-norm approximation.

slide-20
SLIDE 20

Outline

 Norm Approximation

 Basic Norm Approximation  Penalty Function Approximation  Approximation with Constraints

 Least-norm Problems  Regularized Approximation  Classification

 Linear Discrimination  Support Vector Classifier  Logistic Regression

slide-21
SLIDE 21

Approximation with Constraints

 Add Constraints to

 Rule out certain unacceptable approximations of the vector  Ensure that the approximator satisfies certain properties  Prior knowledge of the vector to be estimated  Prior knowledge of the estimation error  Determine the projection of a point

  • n

a set more complicated than a subspace

min 𝐵𝑦 𝑐

slide-22
SLIDE 22

Approximation with Constraints

 Nonnegativity Constraints on Variables

 Estimate a vector

  • f parameters known

to be nonnegative  Determine the projection of a vector

  • nto the cone generated by the columns
  • f

 Approximate using a nonnegative linear combination of the columns of

min 𝐵𝑦 𝑐

  • s. t.

𝑦 ≽ 0

slide-23
SLIDE 23

Approximation with Constraints

 Variable Bounds

 Prior knowledge of intervals in which each variable lies  Determine the projection of a vector

  • nto the image of a box under the linear

mapping induced by

min 𝐵𝑦 𝑐

  • s. t.

𝑚 ≼ 𝑦 ≼ 𝑣

slide-24
SLIDE 24

Approximation with Constraints

 Probability Distribution

 Estimation of proportions or relative frequencies  Approximate 𝑐 by a convex combination of the columns of 𝐵

 Norm Ball Constraint

 𝑦 is prior guess of what the parameter 𝑦 is, and 𝑒 is the maximum plausible deviation min 𝐵𝑦 𝑐

  • s. t.

𝑦 ≽ 0, 1𝑦 1 min 𝐵𝑦 𝑐

  • s. t.

𝑦 𝑦 𝑒

slide-25
SLIDE 25

Outline

 Norm Approximation

 Basic Norm Approximation  Penalty Function Approximation  Approximation with Constraints

 Least-norm Problems  Regularized Approximation  Classification

 Linear Discrimination  Support Vector Classifier  Logistic Regression

slide-26
SLIDE 26

Least-norm Problems

 Basic least-norm Problem

  • is a norm on
  •  The solution is called a least-norm

solution of .  A convex optimization problem  Interesting when

min 𝑦

  • s. t.

𝐵𝑦 𝑐

slide-27
SLIDE 27

Least-norm Problems

 Reformulation as Norm Approximation Problem

 Let

be any solution of

 Let

be a matrix whose columns

are a basis for the nullspace of .  The least-norm problem can be expressed as

𝑦|𝐵𝑦 𝑐 𝑦 𝑎𝑣|𝑣 ∈ 𝐒 min 𝑦 𝑎𝑣

slide-28
SLIDE 28

Least-norm Problems

 Estimation interpretation

 We have 𝑛 𝑜 perfect linear measurement, given by 𝐵𝑦 𝑐  Our measurements do not completely determine 𝑦  Suppose our prior information, is that 𝑦 is more likely to be small than large.  Choose the parameter vector 𝑦 which is smallest among all parameter vectors that are consistent with the measurements

slide-29
SLIDE 29

Least-norm Problems

 Geometric interpretation

 The feasible set is affine  The objective is the distance between and the point  Find the point in the affine set with minimum distance to  Determine the projection of the point 0

  • n the affine set
slide-30
SLIDE 30

Least-norm Problems

 Least-squares Solution of Linear Equations

 The optimality conditions

 𝑤 is the dual variable

 The Solution

min 𝑦

  • s. t.

𝐵𝑦 𝑐 2𝑦∗ 𝐵𝑤∗ 0 𝐵𝑦∗ 𝑐 𝑦∗ 1 2 𝐵𝑤∗ 1 2 𝐵𝐵𝑤∗ 𝑐 𝑤∗ 2 𝐵𝐵 𝑐, 𝑦∗ 𝐵 𝐵𝐵 𝑐

slide-31
SLIDE 31

Least-norm Problems

 Least-penalty Problems

 is convex, nonnegative and satisfies  The penalty function value quantifies our dislike of a component of having value  Find that has least total penalty, subject to the constraint

min 𝜚 𝑦 ⋯ 𝜚 𝑦

  • s. t.

𝐵𝑦 𝑐

slide-32
SLIDE 32

Least-norm Problems

 Sparse Solutions via Least -norm

 Tend to produce a solution with a large number of components equal to  Tend to produce sparse solutions of , often with nonzero components

min 𝑦

  • s. t.

𝐵𝑦 𝑐

slide-33
SLIDE 33

Least-norm Problems

 Sparse Solutions via Least -norm  Find solutions of that have

  • nly

nonzero components

 is a submatrix of  and subvector of  Solve

 If there is a solution, we are done

 Complexity:

min 𝑦

  • s. t.

𝐵𝑦 𝑐

slide-34
SLIDE 34

Outline

 Norm Approximation

 Basic Norm Approximation  Penalty Function Approximation  Approximation with Constraints

 Least-norm Problems  Regularized Approximation  Classification

 Linear Discrimination  Support Vector Classifier  Logistic Regression

slide-35
SLIDE 35

Bi-criterion Formulation

 A (convex) Vector Optimization Problem with Two Objectives

 Find a vector that is small  Make the residual small  Optimal trade-off between the two

  • bjectives

 The minimum value of 𝑦 is 0 and the residual norm is 𝑐  Let 𝐷 denote the set of minimizers of 𝐵𝑦 𝑐 , and then any minimum norm point in 𝐷 is Pareto optimal

  • minw. r. t. 𝐒

𝐵𝑦 𝑐 , 𝑦

slide-36
SLIDE 36

Regularization

 Weighted Sum of the Objectives

 is a problem parameter  A common scalarization method used to solve the bi-criterion problem  As varies over , the solution traces out the optimal trade-off curve

 Weighted Sum of Squared Norms

min 𝐵𝑦 𝑐 𝛿 𝑦 min 𝐵𝑦 𝑐 𝛿 𝑦

slide-37
SLIDE 37

Regularization

 Tikhonov Regularization

 Analytical solution  Since

  • for any

, the Tikhonov regularized least-squares solution requires no rank assumptions on the matrix

min 𝐵𝑦 𝑐

𝜀 𝑦 𝑦 𝐵𝐵 𝜀𝐽 𝑦 2𝑐𝐵𝑦 𝑐𝑐

𝑦 𝐵𝐵 𝜀𝐽 𝐵𝑐

slide-38
SLIDE 38

Regularization

  • norm Regularization

 Find a sparse solution  The residual is measured with the Euclidean norm and the regularization is done with an -norm  By varying the parameter we can sweep out the optimal trade-off curve between

and

  • min

𝐵𝑦 𝑐 𝛿 𝑦

slide-39
SLIDE 39

Example

 Regressor Selection Problem

 One straightforward approach is to check every possible sparsity pattern in with nonzero entries  For a fixed sparsity pattern, we can find the optimal by solving a least-squares problem  Complexity:

min 𝐵𝑦 𝑐

  • s. t.

card 𝑦 𝑙

slide-40
SLIDE 40

Example

 Regressor Selection Problem

 A good heuristic approach is to solve the following problem for different  Find the smallest value of that results in a solution with  We then fix this sparsity pattern and find the value of that minimizes

  • min

𝐵𝑦 𝑐

  • s. t.

card 𝑦 𝑙 min 𝐵𝑦 𝑐 𝛿 𝑦

slide-41
SLIDE 41

Example

slide-42
SLIDE 42

Outline

 Norm Approximation

 Basic Norm Approximation  Penalty Function Approximation  Approximation with Constraints

 Least-norm Problems  Regularized Approximation  Classification

 Linear Discrimination  Support Vector Classifier  Logistic Regression

slide-43
SLIDE 43

Classification

 Given two sets of points in

  •  Find a function
  •  Positive on the first set and negative on

the second 

  • r its 0-level set

, separates, classifies, or discriminates the two sets of points

𝑔 𝑦 0, 𝑗 1, … , 𝑂, 𝑔 𝑧 0, 𝑗 1, … , 𝑁

  • and
slide-44
SLIDE 44

Linear Discrimination

 Affine function

  •  A hyperplane that separates the two

sets of points

 The strict inequalities are homogeneous in and

 Equivalent conditions

𝑏𝑦 𝑐 0, 𝑗 1, … , 𝑂, 𝑏𝑧 𝑐 0, 𝑗 1, … , 𝑁 𝑏𝑦 𝑐 1, 𝑗 1, … , 𝑂, 𝑏𝑧 𝑐 1, 𝑗 1, … , 𝑁

slide-45
SLIDE 45

Example

slide-46
SLIDE 46

Robust Linear Discrimination

 Seek the function that gives the maximum possible ‘gap’ between

and

is normalized  The optimal value ∗ is positive if and

  • nly if the two sets of points can be

linearly discriminated

max 𝑢

  • s. t.

𝑏𝑦 𝑐 𝑢, 𝑗 1, … , 𝑂 𝑏𝑧 𝑐 𝑢, 𝑗 1, … , 𝑁 𝑏 1

slide-47
SLIDE 47

Example

  • If 𝑏 1, 𝑏𝑦 𝑐 is the

Euclidean distance from the point 𝑦 to the separating hyperplane 𝑏𝑨 𝑐

  • 𝑐 𝑏𝑧 is the distance

from 𝑧 to the hyperplane

slide-48
SLIDE 48

Outline

 Norm Approximation

 Basic Norm Approximation  Penalty Function Approximation  Approximation with Constraints

 Least-norm Problems  Regularized Approximation  Classification

 Linear Discrimination  Support Vector Classifier  Logistic Regression

slide-49
SLIDE 49

Support Vector Classifier

 When the two sets of points cannot be linearly separated  One that minimizes the number of points misclassified

 Unfortunately, this is in general a difficult combinatorial optimization problem

slide-50
SLIDE 50

Support Vector Classifier

 When the two sets of points cannot be linearly separated  Relaxation

 Nonnegative variables 𝑣, … , 𝑣 and 𝑤, … , 𝑤  When 𝑣 𝑤 0, we recover the original constraints  By making 𝑣 and 𝑤 large enough, these inequalities can always be made feasible 𝑏𝑦 𝑐 1 𝑣, 𝑗 1, … , 𝑂, 𝑏𝑧 𝑐 1 𝑤 , 𝑗 1, … , 𝑁 𝑏𝑦 𝑐 1, 𝑗 1, … , 𝑂, 𝑏𝑧 𝑐 1, 𝑗 1, … , 𝑁

slide-51
SLIDE 51

Support Vector Classifier

 Our goal is to find and sparse nonnegative and that satisfy the inequalities  We can minimize the sum of the variables

and

  •  When
  • ,

is classified correctly

by

  • , but still incurs a loss
  • min

1𝑣 1𝑤

  • s. t.

𝑏𝑦 𝑐 1 𝑣, 𝑗 1, … , 𝑂 𝑏𝑧 𝑐 1 𝑤 , 𝑗 1, … , 𝑁 𝑣 ≽ 0, 𝑤 ≽ 0

slide-52
SLIDE 52

Example

slide-53
SLIDE 53

Support Vector Classifier

 More generally, we can consider the trade-off between the number of misclassified points, and the width of the slab

  • which is

given by

  •  We want to minimize the error and

maximize the width of the slab and

min 𝑏 𝛿1𝑣 1𝑤

  • s. t.

𝑏𝑦 𝑐 1 𝑣, 𝑗 1, … , 𝑂 𝑏𝑧 𝑐 1 𝑤 , 𝑗 1, … , 𝑁 𝑣 ≽ 0, 𝑤 ≽ 0

slide-54
SLIDE 54

Example

slide-55
SLIDE 55

Outline

 Norm Approximation

 Basic Norm Approximation  Penalty Function Approximation  Approximation with Constraints

 Least-norm Problems  Regularized Approximation  Classification

 Linear Discrimination  Support Vector Classifier  Logistic Regression

slide-56
SLIDE 56

Logistic Regression

 is a random variable with values 0 or 1, with a distribution that depends on

  •  Logistic

Model

 Given sets of points,

  • and
  • , arise as samples from the

logistic model

prob z 1 exp 𝑏𝑣 𝑐 1 exp 𝑏𝑣 𝑐 prob z 0 1 1 exp𝑏𝑣 𝑐

slide-57
SLIDE 57

Logistic Regression

 Maximum Likelihood Estimation

 is the log-likelihood function

 If the two sets of points can be linearly separated, then the optimization problem is unbounded below

 Add domain constraints

min 𝑚𝑏, 𝑐

𝑚 𝑏, 𝑐 𝑏𝑦 𝑐

  • log1 exp

𝑏𝑦 𝑐 log1 exp𝑏𝑧 𝑐

slide-58
SLIDE 58

Example

slide-59
SLIDE 59

Summary

 Norm Approximation

 Basic Norm Approximation  Penalty Function Approximation  Approximation with Constraints

 Least-norm Problems  Regularized Approximation  Classification

 Linear Discrimination  Support Vector Classifier  Logistic Regression