Multiplicative Updates for Nonnegative Least Squares Donghui Chen - - PowerPoint PPT Presentation

multiplicative updates for nonnegative least squares
SMART_READER_LITE
LIVE PREVIEW

Multiplicative Updates for Nonnegative Least Squares Donghui Chen - - PowerPoint PPT Presentation

Multiplicative Updates for Nonnegative Least Squares Donghui Chen School of Securities and Futures Southwestern University of Finance and Economics November 18, 2013 Joint work with Matt Brand, Mitsbushi Electronic Research Lab D. Chen


slide-1
SLIDE 1

Multiplicative Updates for Nonnegative Least Squares

Donghui Chen

School of Securities and Futures Southwestern University of Finance and Economics

November 18, 2013 Joint work with Matt Brand, Mitsbushi Electronic Research Lab

  • D. Chen (SWUFE)

NNLS November 18, 2013 1 / 23

slide-2
SLIDE 2

what really matters is the wisdom he teaches you, ... – Sofia Pauca

  • D. Chen (SWUFE)

NNLS November 18, 2013 2 / 23

slide-3
SLIDE 3

Outline

1

Introduction

2

Multiplicative NNLS Iteration The Algorithm Properties Convergence Analysis Sparse Solution Accerleration

3

Numerical Experiments: Image Labelling

4

Conclusion Remarks

  • D. Chen (SWUFE)

NNLS November 18, 2013 3 / 23

slide-4
SLIDE 4

Objective function

Nonnegative Least Squares argmin

x

F(x) = argmin

x

||Ax − b||2

2

s.t. x ≥ 0, (1)

  • D. Chen (SWUFE)

NNLS November 18, 2013 4 / 23

slide-5
SLIDE 5

Objective function

Nonnegative Least Squares argmin

x

F(x) = argmin

x

||Ax − b||2

2

s.t. x ≥ 0, (1) Because ||Ax − b||2

2

= (Ax − b)T (Ax − b) = xT (AT A)x − bT (Ax)

  • scalar

− (Ax)T b

  • scalar

+ bT b

constant

= xT (AT A)x − xT (AT b) − xT (AT b) + bT b = xT (AT A)x − 2xT (AT b) + bT b

  • D. Chen (SWUFE)

NNLS November 18, 2013 4 / 23

slide-6
SLIDE 6

Objective function

Nonnegative Least Squares argmin

x

F(x) = argmin

x

||Ax − b||2

2

s.t. x ≥ 0, (1) Because ||Ax − b||2

2

= (Ax − b)T (Ax − b) = xT (AT A)x − bT (Ax)

  • scalar

− (Ax)T b

  • scalar

+ bT b

constant

= xT (AT A)x − xT (AT b) − xT (AT b) + bT b = xT (AT A)x − 2xT (AT b) + bT b Hence, solving Equation (1) is equivalent to solving argmin

x

F(x) = argmin

x

1 2xT Qx − xT h s.t. x ≥ 0, (2) with Q = AT A and h = AT b.

  • D. Chen (SWUFE)

NNLS November 18, 2013 4 / 23

slide-7
SLIDE 7

1

Introduction

2

Multiplicative NNLS Iteration The Algorithm Properties Convergence Analysis Sparse Solution Accerleration

3

Numerical Experiments: Image Labelling

4

Conclusion Remarks

  • D. Chen (SWUFE)

NNLS November 18, 2013 5 / 23

slide-8
SLIDE 8

Multiplicative NNLS Iteration

Theorem (Multiplicative NNLS Iteration)

Nonnegative least squares objective function F(x) in Equation (2) is monotonically decreasing under the multiplicative update xk+1

i

= xk

i

2(Q−xk)i + h+

i + δ

(|Q|xk)i + h−

i + δ

  • ,

(3) with δ > 0, Q− = − min(Q, 0), |Q| = abs(Q), h+ = max(h, 0), h− = − min(h, 0).

  • M. E. Daube-Witherspoon, G. Muehllehner, in IEEE Trans. on Medical Imaging, 1986.
  • D. Lee, S. Seung, in Nature, 1999
  • D. Chen (SWUFE)

NNLS November 18, 2013 6 / 23

slide-9
SLIDE 9

Multiplicative NNLS Iteration

Theorem (Multiplicative NNLS Iteration)

Nonnegative least squares objective function F(x) in Equation (2) is monotonically decreasing under the multiplicative update xk+1

i

= xk

i

2(Q−xk)i + h+

i + δ

(|Q|xk)i + h−

i + δ

  • ,

(3) with δ > 0, Q− = − min(Q, 0), |Q| = abs(Q), h+ = max(h, 0), h− = − min(h, 0). Remark: If Q and h have only nonnegative components and δ = 0, above iteration reduces to xk+1

i

= xk

i

  • hi

(Qxk)i

  • ,

which is called image space reconstruction algorithm (ISRA). Lee ad Seung generalize the ISRA idea to NMF.

1

  • M. E. Daube-Witherspoon, G. Muehllehner, in IEEE Trans. on Medical Imaging, 1986.
  • D. Lee, S. Seung, in Nature, 1999
  • D. Chen (SWUFE)

NNLS November 18, 2013 6 / 23

slide-10
SLIDE 10

Gradient Descent Property

The multiplicative update (3) is an element-wise iterative gradient descent method. xk+1

i

− xk

i

= 2(Q−xk)i + h+

i + δ

(|Q|xk)i + h−

i + δ

  • xk

i − xk i

= 2(Q−xk)i + h+

i − (|Q|xk)i − h− i

(|Q|xk)i + h−

i + δ

  • xk

i

= −

  • (Qxk)i − hi

(|Q|xk)i − h−

i + δ

  • xk

i

= −

  • xk

i

(|Q|xk)i − h−

i + δ

  • ((Qxk)i − hi)

= −γk∇(F(xk)), where the step-size γk =

  • xk

i

(|Q|xk)i−h−

i +δ

  • , and ∇(F(x)) = Qxk − h.
  • D. Chen (SWUFE)

NNLS November 18, 2013 7 / 23

slide-11
SLIDE 11

What if δ = 0?

Suppose Q =

  • 1

−1 −1 1

  • ,

h = 0, with initial guess, x0 = (2 3, 4 3), x1 = (4 3, 2 3), x2 = (2 3, 4 3), · · · However, the optimal solution is x∗ = (r, r), r ∈ R. iterations by (3) with δ = 0

  • D. Chen (SWUFE)

NNLS November 18, 2013 8 / 23

slide-12
SLIDE 12

Positive δ

Suppose Q =

  • 1

−1 −1 1

  • ,

h = 0, with initial guess, x0 = (2 3, 4 3), . . . x∞ = (1, 1), iterations by (3) with δ = 1

  • D. Chen (SWUFE)

NNLS November 18, 2013 9 / 23

slide-13
SLIDE 13

Convergence Analysis

Definition (Auxiliary Function)

For positive vectors, x, y, an auxiliary function, G(x, y), of F(x), has the following two properties

  • F(x) < G(x, y)

if x = y;

  • F(x) = G(x, x)
  • D. Chen (SWUFE)

NNLS November 18, 2013 10 / 23

slide-14
SLIDE 14

Convergence Analysis

Definition (Auxiliary Function)

For positive vectors, x, y, an auxiliary function, G(x, y), of F(x), has the following two properties

  • F(x) < G(x, y)

if x = y;

  • F(x) = G(x, x)
  • D. Chen (SWUFE)

NNLS November 18, 2013 10 / 23

slide-15
SLIDE 15

Convergence Analysis contd.

Lemma

Assume G(x, y) is an auxiliary function of F(x), then F(x) is strictly decreasing under the update xk+1 = argmin

x

G(x, xk), if and only if xk+1 = xk.

  • D. Chen (SWUFE)

NNLS November 18, 2013 11 / 23

slide-16
SLIDE 16

Convergence Analysis contd.

Lemma

Assume G(x, y) is an auxiliary function of F(x), then F(x) is strictly decreasing under the update xk+1 = argmin

x

G(x, xk), if and only if xk+1 = xk. Proof: By the definition of an auxiliary function G(x, y), if xk+1 = xk, we have F(xk+1) < G(xk+1, xk) ≤ G(xk, xk) = F(xk). The equality attains if and only if xk+1 = xk.

  • D. Chen (SWUFE)

NNLS November 18, 2013 11 / 23

slide-17
SLIDE 17

Convergence Analysis contd.

Lemma

For any positive vectors, x, y, define the diagonal matrix, D(y), with diagonal element Dii = (|Q|y)i + h−

i + δ

yi , i = 1, 2, · · · , n where δ > 0. The function G(x, y) = F(y) + (x − y)T ∇F(y) + 1 2(x − y)T D(y)(x − y) is an auxiliary function for F(x) = 1 2xT Qx − xT h.

  • D. Chen (SWUFE)

NNLS November 18, 2013 12 / 23

slide-18
SLIDE 18

Review

Theorem (Multiplicative NNLS Iteration)

Nonnegative least squares objective function F(x) argmin

x

F(x) = argmin

x

1 2xT Qx − xT h s.t. x ≥ 0, is monotonically decreasing under the multiplicative update xk+1

i

= xk

i

2(Q−xk)i + h+

i + δ

(|Q|xk)i + h−

i + δ

  • ,

with δ > 0, Q− = − min(Q, 0), |Q| = abs(Q), h+ = max(h, 0), h− = − min(h, 0).

  • D. Chen (SWUFE)

NNLS November 18, 2013 13 / 23

slide-19
SLIDE 19

Review contd.

Suppose Q =

  • 1

−1 −1 1

  • ,

h = 0, with initial guess, x0 = (2 3, 4 3), . . . x∞ = (1, 1), iterations by (3) with δ = 1

  • D. Chen (SWUFE)

NNLS November 18, 2013 14 / 23

slide-20
SLIDE 20

Sparse Solution?

If a sparse solution is expected, it is recommended to add a regularization term to the original least squares problem, argmin

x

ˆ F(x) = argmin

x

||Ax − b||2

2 + λ||x||1,

x ≥ 0, λ > 0 (4) with nonnegative λ as the regularization parameter.

  • D. Chen (SWUFE)

NNLS November 18, 2013 15 / 23

slide-21
SLIDE 21

Sparse Solution?

If a sparse solution is expected, it is recommended to add a regularization term to the original least squares problem, argmin

x

ˆ F(x) = argmin

x

||Ax − b||2

2 + λ||x||1,

x ≥ 0, λ > 0 (4) with nonnegative λ as the regularization parameter.

Theorem

The objective function ˆ F(x) in (4) is monotonically decreasing under the multiplicative update xk+1

i

= xk

i

2(Q−xk)i + h+

i

(|Q|xk)i + h−

i + λ

  • ,

(5) with λ > 0.

  • D. Chen (SWUFE)

NNLS November 18, 2013 15 / 23

slide-22
SLIDE 22

Sparse Solution cont.

Suppose Q =

  • 1

−1 −1 1

  • ,

h = 0, with initial guess, x0 = (2 3, 4 3), . . . x∞ = (0, 0), iterations by (5) with λ = 2

  • D. Chen (SWUFE)

NNLS November 18, 2013 16 / 23

slide-23
SLIDE 23

1

Introduction

2

Multiplicative NNLS Iteration The Algorithm Properties Convergence Analysis Sparse Solution Accerleration

3

Numerical Experiments: Image Labelling

4

Conclusion Remarks

  • D. Chen (SWUFE)

NNLS November 18, 2013 17 / 23

slide-24
SLIDE 24

Image Labelling2

f(x) :=

K

  • a=1
  • i

 η 2

  • j∈N (i)

ωij(xia − xja)2 + diaxia   with constraints ∀i,

K

  • a=1

xia = 1, xia ≥ 0,

  • xia is the probability of pixel i belongs to labelling set a
  • K is the number of labelling sets
  • ωij is the weight between adjacent pixel i and j,

ωij := IT

i Ij

|Ii| · |Ij| = cos(θ), where I· is the image value

  • N(i) represents the neighbours of pixel i
  • η is a parameter controlling the spatial smoothness
  • dia is the cost of label a at each pixel
  • M. Rivera, O. Dalmau, and J. Tago, in ICPR, pp.1-5, 2008.
  • D. Chen (SWUFE)

NNLS November 18, 2013 18 / 23

slide-25
SLIDE 25

Image Labelling: Matrix d3

  • Mixture Gaussian

◮ Assume the data points were drawn from N independent

Gaussian distributions with mean µl and covariance Σl.

◮ Compute the Mahalanobis distance between each pixel i and

these Gaussian distributions. dia =

  • l

(xi − µla)T Σ−1

la (x − µla) + log(Σla)

  • C. Chang, C. Lin, LIBSVM, 2001.
  • D. Chen (SWUFE)

NNLS November 18, 2013 19 / 23

slide-26
SLIDE 26

Image Labelling: Matrix d3

  • Mixture Gaussian

◮ Assume the data points were drawn from N independent

Gaussian distributions with mean µl and covariance Σl.

◮ Compute the Mahalanobis distance between each pixel i and

these Gaussian distributions. dia =

  • l

(xi − µla)T Σ−1

la (x − µla) + log(Σla)

  • Support Vector Machine (SVM)

◮ Using SVM to find the support vectors for each labelling set. ◮ Compute the decision function.

dia =

  • l

αlaK(xi, SVia) + ba, where K(∗, ∗) is the kernel function in SVM, αla is the coefficients, and ba is the bias for labelling set a.

  • C. Chang, C. Lin, LIBSVM, 2001.
  • D. Chen (SWUFE)

NNLS November 18, 2013 19 / 23

slide-27
SLIDE 27

Image Labelling contd.

  • D. Chen (SWUFE)

NNLS November 18, 2013 20 / 23

slide-28
SLIDE 28

1

Introduction

2

Multiplicative NNLS Iteration The Algorithm Properties Convergence Analysis Sparse Solution Accerleration

3

Numerical Experiments: Image Labelling

4

Conclusion Remarks

  • D. Chen (SWUFE)

NNLS November 18, 2013 21 / 23

slide-29
SLIDE 29

Conclusion

Introduced a new algorithm along with its convergence analysis for the NNLS problem argmin

x

F(x) = argmin

x

||Ax − b||2

2

s.t. x ≥ 0, xk+1

i

= xk

i

2(Q−xk)i + h+

i + δ

(|Q|xk)i + h−

i + δ

  • ,

where Q = AT A and h = AT b.

  • D. Chen (SWUFE)

NNLS November 18, 2013 22 / 23

slide-30
SLIDE 30

Happy Birthday, Bob!

  • D. Chen (SWUFE)

NNLS November 18, 2013 23 / 23

slide-31
SLIDE 31

Happy Birthday, Bob!

  • D. Chen (SWUFE)

NNLS November 18, 2013 23 / 23