Multiplicative Updates for Nonnegative Least Squares Donghui Chen - PowerPoint PPT Presentation

Multiplicative Updates for Nonnegative Least Squares Donghui Chen School of Securities and Futures Southwestern University of Finance and Economics November 18, 2013 Joint work with Matt Brand, Mitsbushi Electronic Research Lab D. Chen (SWUFE) NNLS November 18, 2013 1 / 23

what really matters is the wisdom he teaches you, ... – Sofia Pauca D. Chen (SWUFE) NNLS November 18, 2013 2 / 23

Outline 1 Introduction 2 Multiplicative NNLS Iteration The Algorithm Properties Convergence Analysis Sparse Solution Accerleration 3 Numerical Experiments: Image Labelling 4 Conclusion Remarks D. Chen (SWUFE) NNLS November 18, 2013 3 / 23

Objective function Nonnegative Least Squares || Ax − b || 2 x ≥ 0 , argmin F ( x ) = argmin s.t. (1) 2 x x D. Chen (SWUFE) NNLS November 18, 2013 4 / 23

Objective function Nonnegative Least Squares || Ax − b || 2 x ≥ 0 , argmin F ( x ) = argmin s.t. (1) 2 x x Because ( Ax − b ) T ( Ax − b ) || Ax − b || 2 = 2 x T ( A T A ) x − b T ( Ax ) − ( Ax ) T b + b T b = � �� constant scalar scalar x T ( A T A ) x − x T ( A T b ) − x T ( A T b ) + b T b = x T ( A T A ) x − 2 x T ( A T b ) + b T b = D. Chen (SWUFE) NNLS November 18, 2013 4 / 23

Objective function Nonnegative Least Squares || Ax − b || 2 x ≥ 0 , argmin F ( x ) = argmin s.t. (1) 2 x x Because ( Ax − b ) T ( Ax − b ) || Ax − b || 2 = 2 x T ( A T A ) x − b T ( Ax ) − ( Ax ) T b + b T b = � �� constant scalar scalar x T ( A T A ) x − x T ( A T b ) − x T ( A T b ) + b T b = x T ( A T A ) x − 2 x T ( A T b ) + b T b = Hence, solving Equation (1) is equivalent to solving 1 2 x T Qx − x T h x ≥ 0 , argmin F ( x ) = argmin s.t. (2) x x with Q = A T A and h = A T b . D. Chen (SWUFE) NNLS November 18, 2013 4 / 23

1 Introduction 2 Multiplicative NNLS Iteration The Algorithm Properties Convergence Analysis Sparse Solution Accerleration 3 Numerical Experiments: Image Labelling 4 Conclusion Remarks D. Chen (SWUFE) NNLS November 18, 2013 5 / 23

Multiplicative NNLS Iteration Theorem (Multiplicative NNLS Iteration) Nonnegative least squares objective function F ( x ) in Equation (2) is monotonically decreasing under the multiplicative update � 2( Q − x k ) i + h + � i + δ x k +1 = x k , (3) i i ( | Q | x k ) i + h − i + δ with δ > 0 , Q − = − min( Q, 0) , | Q | = abs ( Q ) , h + = max( h, 0) , h − = − min( h, 0) . M. E. Daube-Witherspoon, G. Muehllehner, in IEEE Trans. on Medical Imaging , 1986. D. Lee, S. Seung, in Nature , 1999 D. Chen (SWUFE) NNLS November 18, 2013 6 / 23

Multiplicative NNLS Iteration Theorem (Multiplicative NNLS Iteration) Nonnegative least squares objective function F ( x ) in Equation (2) is monotonically decreasing under the multiplicative update � 2( Q − x k ) i + h + � i + δ x k +1 = x k , (3) i i ( | Q | x k ) i + h − i + δ with δ > 0 , Q − = − min( Q, 0) , | Q | = abs ( Q ) , h + = max( h, 0) , h − = − min( h, 0) . Remark: If Q and h have only nonnegative components and δ = 0 , above iteration reduces to � � h i x k +1 = x k , i i ( Qx k ) i which is called image space reconstruction algorithm (ISRA). Lee ad Seung generalize the ISRA idea to NMF. 1 M. E. Daube-Witherspoon, G. Muehllehner, in IEEE Trans. on Medical Imaging , 1986. D. Lee, S. Seung, in Nature , 1999 D. Chen (SWUFE) NNLS November 18, 2013 6 / 23

Gradient Descent Property The multiplicative update (3) is an element-wise iterative gradient descent method. � 2( Q − x k ) i + h + � i + δ x k +1 − x k x k i − x k = i i i ( | Q | x k ) i + h − i + δ � 2( Q − x k ) i + h + � i − ( | Q | x k ) i − h − x k i = i ( | Q | x k ) i + h − i + δ � � ( Qx k ) i − h i x k = − i ( | Q | x k ) i − h − i + δ � � x k i (( Qx k ) i − h i ) = − ( | Q | x k ) i − h − i + δ − γ k ∇ ( F ( x k )) , = � � x k , and ∇ ( F ( x )) = Qx k − h . where the step-size γ k = i ( | Q | x k ) i − h − i + δ D. Chen (SWUFE) NNLS November 18, 2013 7 / 23

What if δ = 0 ? Suppose � � 1 − 1 Q = , h = 0 , − 1 1 with initial guess, (2 3 , 4 x 0 = 3) , (4 3 , 2 x 1 = 3) , (2 3 , 4 x 2 = 3) , · · · However, the optimal solution is x ∗ = ( r, r ) , r ∈ R . iterations by (3) with δ = 0 D. Chen (SWUFE) NNLS November 18, 2013 8 / 23

Positive δ Suppose � � 1 − 1 Q = , h = 0 , − 1 1 with initial guess, (2 3 , 4 x 0 = 3) , . . . x ∞ = (1 , 1) , iterations by (3) with δ = 1 D. Chen (SWUFE) NNLS November 18, 2013 9 / 23

Convergence Analysis Definition (Auxiliary Function) For positive vectors, x , y , an auxiliary function, G ( x, y ) , of F ( x ) , has the following two properties • F ( x ) < G ( x, y ) if x � = y ; • F ( x ) = G ( x, x ) D. Chen (SWUFE) NNLS November 18, 2013 10 / 23

Convergence Analysis contd. Lemma Assume G ( x, y ) is an auxiliary function of F ( x ) , then F ( x ) is strictly decreasing under the update x k +1 = argmin G ( x, x k ) , x if and only if x k +1 � = x k . D. Chen (SWUFE) NNLS November 18, 2013 11 / 23

Convergence Analysis contd. Lemma Assume G ( x, y ) is an auxiliary function of F ( x ) , then F ( x ) is strictly decreasing under the update x k +1 = argmin G ( x, x k ) , x if and only if x k +1 � = x k . Proof: By the definition of an auxiliary function G ( x, y ) , if x k +1 � = x k , we have F ( x k +1 ) < G ( x k +1 , x k ) ≤ G ( x k , x k ) = F ( x k ) . The equality attains if and only if x k +1 = x k . D. Chen (SWUFE) NNLS November 18, 2013 11 / 23

Convergence Analysis contd. Lemma For any positive vectors, x , y , define the diagonal matrix, D ( y ) , with diagonal element D ii = ( | Q | y ) i + h − i + δ i = 1 , 2 , · · · , n , y i where δ > 0 . The function G ( x, y ) = F ( y ) + ( x − y ) T ∇ F ( y ) + 1 2( x − y ) T D ( y )( x − y ) is an auxiliary function for F ( x ) = 1 2 x T Qx − x T h. D. Chen (SWUFE) NNLS November 18, 2013 12 / 23

Review Theorem (Multiplicative NNLS Iteration) Nonnegative least squares objective function F ( x ) 1 2 x T Qx − x T h x ≥ 0 , argmin F ( x ) = argmin s.t. x x is monotonically decreasing under the multiplicative update � 2( Q − x k ) i + h + � i + δ x k +1 = x k , i i ( | Q | x k ) i + h − i + δ with δ > 0 , Q − = − min( Q, 0) , | Q | = abs ( Q ) , h + = max( h, 0) , h − = − min( h, 0) . D. Chen (SWUFE) NNLS November 18, 2013 13 / 23

Review contd. Suppose � � 1 − 1 Q = , h = 0 , − 1 1 with initial guess, (2 3 , 4 x 0 = 3) , . . . x ∞ = (1 , 1) , iterations by (3) with δ = 1 D. Chen (SWUFE) NNLS November 18, 2013 14 / 23

Sparse Solution? If a sparse solution is expected, it is recommended to add a regularization term to the original least squares problem, ˆ || Ax − b || 2 2 + λ || x || 1 , x ≥ 0 , λ > 0 argmin F ( x ) = argmin (4) x x with nonnegative λ as the regularization parameter. D. Chen (SWUFE) NNLS November 18, 2013 15 / 23

Sparse Solution? If a sparse solution is expected, it is recommended to add a regularization term to the original least squares problem, ˆ || Ax − b || 2 2 + λ || x || 1 , x ≥ 0 , λ > 0 argmin F ( x ) = argmin (4) x x with nonnegative λ as the regularization parameter. Theorem The objective function ˆ F ( x ) in (4) is monotonically decreasing under the multiplicative update � 2( Q − x k ) i + h + � x k +1 = x k i , (5) i i ( | Q | x k ) i + h − i + λ with λ > 0 . D. Chen (SWUFE) NNLS November 18, 2013 15 / 23

Sparse Solution cont. Suppose � � 1 − 1 Q = , h = 0 , − 1 1 with initial guess, (2 3 , 4 x 0 = 3) , . . . x ∞ = (0 , 0) , iterations by (5) with λ = 2 D. Chen (SWUFE) NNLS November 18, 2013 16 / 23

1 Introduction 2 Multiplicative NNLS Iteration The Algorithm Properties Convergence Analysis Sparse Solution Accerleration 3 Numerical Experiments: Image Labelling 4 Conclusion Remarks D. Chen (SWUFE) NNLS November 18, 2013 17 / 23

Image Labelling 2   K � �  η � ω ij ( x ia − x ja ) 2 + d ia x ia f ( x ) :=  2 a =1 i j ∈N ( i ) with constraints K � ∀ i, x ia = 1 , x ia ≥ 0 , a =1 • x ia is the probability of pixel i belongs to labelling set a • K is the number of labelling sets • ω ij is the weight between adjacent pixel i and j , I T i I j ω ij := | I i | · | I j | = cos( θ ) , where I · is the image value • N ( i ) represents the neighbours of pixel i • η is a parameter controlling the spatial smoothness • d ia is the cost of label a at each pixel M. Rivera, O. Dalmau, and J. Tago, in ICPR , pp.1-5, 2008. D. Chen (SWUFE) NNLS November 18, 2013 18 / 23

Multiplicative Updates for Nonnegative Least Squares Donghui Chen - PowerPoint PPT Presentation

Multiplicative Updates for Nonnegative Least Squares Donghui Chen School of Securities and Futures Southwestern University of Finance and Economics November 18, 2013 Joint work with Matt Brand, Mitsbushi Electronic Research Lab D. Chen

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

CS 170 Section 13 Multiplicative Updates Owen Jow April 25, 2018 University of California,

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Geometry of Least Squares 2 Least squares from the

OPTIMAL POINT TARIFFS IN TRANSMISSION AND DISTRIBUTION USING NONNEGATIVE LEAST SQUARES ESTIMATES

A new quasi-Monte Carlo technique based on nonnegative least squares and approximate Fekete

Atmospheric Chemical Composition, Climate, and Societal Implications Steven C. Wofsy Harvard

Codimension-two points The 1:2 mode interaction System with O (2) symmetry with competing

HAVEN Act of 2019 Public Law No. 116-52 52 NACTT ACADEMY WEBINAR OCTOBER 23, 2019 Presented

Small-block Disk Forensics and Triage Outline Disk Structure. Triage. File Signatures.

Tableaux for Veri fi cation of Data-Centric Processes Andreas Bauer 1,2 Peter Baumgartner 1,2

U.S. Department of Housing and Urban Development Office of Housing Counseling Facilitated by

How to prove a secret isogeny Luca De Feo Universit Paris Saclay UVSQ, France June 4,

MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN