Fast Newton-type Methods for Nonnegative Matrix and Tensor - PowerPoint PPT Presentation

Fast Newton-type Methods for Nonnegative Matrix and Tensor Approximation Inderjit S. Dhillon Department of Computer Sciences The University of Texas at Austin Joint work with Dongmin Kim and Suvrit Sra

Outline Introduction 1 2 Nonnegative Matrix and Tensor Approximation Existing NNMA Algorithms 3 4 Newton-type Method for NNMA Experiments 5 6 Summary

Introduction Problem Setting Nonnegative matrix approximation (NNMA) problem: a i ∈ R M A = [ a 1 ,..., a N ] , + , is input nonnegative matrix Goal : Approximate A by conic combinations of nonnegative representative vectors b 1 ,..., b K such that K ∑ a i ≈ c ji ≥ 0 , b j ≥ 0 , b j c ji , j = 1 A ≈ BC , B , C ≥ 0 . i.e.

Introduction Objective or Distortion Functions The quality of the approximation A ≈ BC is Measured using an appropriate distortion function For example, the Frobenius norm distortion or the Kullback-Leibler divergence In this presentation, we focus on the Frobenius norm distortion, which leads to the least squares NNMA problem: F ( B ; C ) = 1 2 � A − BC � 2 F , minimize B , C ≥ 0

Nonnegative Matrix Approximation Basic Framework NNMA objective function is not simultaneously convex in B & C But is individually convex in B & in C Most NNMA algorithms are iterative and perform an alternating optimization Basic Framework for NNMA algorithms 1. Initialize B 0 and/or C 0 ; set t ← 0. 2. Fix B t and solve the problem w.r.t C , Obtain C t + 1 . 3. Fix C t + 1 and solve the problem w.r.t B , Obtain B t + 1 . 4. Let t ← t + 1, & repeat Steps 2 and 3 until convergence criteria are satisfied.

Nonnegative Tensor Approximation Problem Setting For brevity, consider 3-mode tensors only Least squares objective function A , T ∈ R ℓ × m × n + l m n � 2 . � A − T � 2 ∑ ∑ ∑ � F = [ A ] ijk − [ T ] ijk i = 1 j = 1 k = 1 Given a nonnegative tensor A ∈ R ℓ × m × n , find a nonnegative approximation T ∈ R ℓ × m × n which consists of nonnegative components Tensor decomposition : “PARAFAC” or “Tucker”

Nonnegative PARAFAC Decomposition PARAFAC or Outer Product Decomposition: � A − T � 2 minimize F k p i ⊗ q i ⊗ r i , ∑ T = subject to i = 1 A , T ∈ R ℓ × m × n , where P = [ p i ] ∈ R ℓ × k , Q = [ q i ] ∈ R m × k , R = [ r i ] ∈ R n × k , P , Q , R ≥ 0 .

Nonnegative Tucker Decomposition Tucker decomposition of tensors, � A − T � 2 minimize F � � T = P , Q , R · Z , subject to A , T ∈ R ℓ × m × n , Z ∈ R p × q × r , where P ∈ R ℓ × p , Q ∈ R m × q , R ∈ R n × r , Z , P , Q , R ≥ 0 .

Nonnegative PARAFAC Decomposition Algorithm - Reduce to NNMA Basic Idea: build a matrix approximation problem For example, for matrix factor P , Fix Q and R Form Z ∈ R k × mn where i -th row corresponds to vectorized q i ⊗ r i Form A ∈ R ℓ × mn where i -th row corresponds to vectorized A ( i , : , :) Now the problem is � A − PZ � 2 minimize F . P ≥ 0

Nonnegative Tucker Decomposition Algorithm - Update Matrix Factors by Reducing to NNMA Basic Idea: build a matrix approximation problem For example, for matrix factor P , Fix Z , Q and R Form Z ∈ R p × mn by flattenning the tensor � � Q , R · Z along mode-1 � � Computing T = P , Q , R · Z is equivalent to PZ Flatten the tensor A similarly, obtain a matrix A ∈ R ℓ × mn Now the problem is � A − PZ � 2 F . minimize P , Z ≥ 0

Existing NNMA Algorithms NNLS : Column-wise subproblem The Frobenius norm is the sum of Euclidean norms over columns Optimization over B (or C ) boils down to a series of nonnegative least squares (NNLS) problems For example, fix B and find a solution x — i -th column of C reduces a NNLS problem: f ( x ) = 1 2 � Bx − a i � 2 minimize 2 , x x ≥ 0 . subject to

Existing NNMA Algorithms Exact Methods Basic Framework for Exact Methods 1. Initialize B 0 and/or C 0 ; set t ← 0. 2. Fix B t and find C t + 1 such that C t + 1 = argmin F ( B t , C ) , C 3. Fix C t + 1 and find B t + 1 such that B t + 1 = argmin F ( B , C t + 1 ) , B 4. Let t ← t + 1, & repeat Steps 2 and 3 until convergence criteria are satisfied. Exact Methods Based on NNLS algorithms: Active set procedure [Lawson & Hanson, 1974] FNNLS [Bro & Jong, 1997] Interior-point gradient method Projected gradient method [Lin, 2005].

Existing NNMA Algorithms Inexact Methods Basic Framework for Inexact Methods 1. Initialize B 0 and/or C 0 ; set t ← 0. 2. Fix B t and find C t + 1 such that F ( B t , C t + 1 ) ≤ F ( B t , C t ) , 3. Fix C t + 1 and find B t + 1 such that F ( B t + 1 , C t + 1 ) ≤ F ( B t , C t + 1 ) , 4. Let t ← t + 1, & repeat Steps 2 and 3 until convergence criteria are satisfied. Inexact Methods Multiplicative method [Lee & Seung, 1999] Alternating Least Squares (ALS) algorithm “Projected Quasi-Newton” method [Zdunek & Cichocki, 2006]

Existing NNMA Algorithms Deficiencies Active Set based methods NOT suitable for large-scale problems Gradient Descent based methods May suffer from slow convergence — known as zigzagging Newton-type methods Naïve combination with projection does NOT guarantee convergence

Previous Attempts at Newton-type Methods for NNMA x Difficulties 2 k r f ( x ) k k k P [ x � � D r f ( x )℄ + k T � 1 T k T P [ x � ( G G ) ( G Gx � G h )℄ k + x x 1 lev el sets of f � x k k k x � � D r f ( x ) k T � 1 T k T x � = x � ( G G ) ( G Gx � G h ) Naïve Combination of projection step and non-diagonal gradient scaling does not guarantee convergence An iteration may actually lead to an increase of objective

Projected Newton-type Methods Ideas from the Previous Methods The active set : If active variables at the final solution are known in advance, Original problem reduces to an equality-constrained problem Equivalently one can solve an unconstrained sub-problem over inactive variables Projection : The projection step identifies active variables at each iteration Gradient : The gradient information gives a guideline to determine which variables will not be optimized at the next iteration

Projected Newton-type Methods Overview Combine Projection with non-diagonal gradient scaling At each iteration, partition variables into two disjoint set, Fixed and Free variables Optimize the objective function over Free variables Convergence to a stationary point of F is guaranteed Any positive definite gradient scaling scheme is allowed, i.e., the inverse of full Hessian, an approximated Hessian by BFGS, conjugate gradient, etc

Projected Newton-type Methods Fixed Set Divide variables into Free variables and Fixed variables. Fixed Set: Indices listing entries of x k that are held fixed Definition: a set of indices I k = � � � � x k i = 0 , [ ∇ f ( x k )] i > 0 i . A subset of active variables at iteration k Contains active variables that satisfy the KKT conditions

Newton-type Methods x 2 k r f ( x ) k k k P [ x � � D r f ( x )℄ + k T � 1 T k T P [ x � ( G G ) ( G Gx � G h )℄ k + x x 1 lev el sets of f � x k k k x � � D r f ( x ) k T � 1 T k T x � = x � ( G G ) ( G Gx � G h )

Fast Newton-type Nonnegative Matrix Approximation FNMA E & FNMA I – an exact and Inexact Method A subprocedure to update C in FNMA E 1. Compute the gradient matrix ∇ C F ( B ; C old ) . 2. Compute fixed set I + for C old . 3. Compute the step length vector α . 4. Update C old as � ∇ C F ( B ; C old ) � U ← Z + ; //Remove gradient info. from fixed vars � � U ← Z + ; DU //Fix fixed vars C new ← P + C old − U · diag ( α ) � � //Enforce feasibility 5. C old ← C new 6. Update D if necessary FNMA I : To speed up computation, Step-size α is parameterized Inverse Hessian is used for non-diagonal gradient scaling

Experiments Comparisons against ZC Relative error of approximation for matrix with (M,N,K)=(200,40,10) Relative error of approximation for matrix with (M,N,K)=(200,200,20) Relative error of approximation for matrix with (M,N,K)=(500,200,20) ZC ZC ZC 0.95 FNMA I FNMA I FNMA I 0.92 0.9 FNMA E FNMA E FNMA E 0.9 0.9 Relative error of approximation 0.8 Relative error of approximation Relative error of approximation 0.88 0.85 0.86 0.7 0.8 0.84 0.6 0.82 0.75 0.5 0.8 0.7 0.78 0.4 0.65 0 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 Number of iterations Number of iterations Number of iterations (a) Dense (b) Sparse (c) Sparse Relative approximation error against iteration count for ZC, FNMA I & FNMA E Relative errors achieved by both FNMA I and FNMA E are lower than ZC. Note that ZC does not decrease the errors monotonically

Experiments Application to Image Processing Relative error of approximation for image matrix with (M,N,K)=(9216,143,20) ALS Lee/Seung FNMA I 0.3 Relative error of approximation 0.25 0.2 0.15 0.1 0.05 5 10 15 20 25 30 35 40 45 Number of iterations FNMA I Original ALS LS Image reconstruction as obtained by the ALS, LS, and FNMA I procedures Reconstruction was computed from a rank-20 approximation ALS leads to a non-monotonic change in the objective function value

Experiments Application to Image Processing - Swimmer dataset - rank 13 FNMA E rank 17 Lee & Seung’s rank 17

Fast Newton-type Methods for Nonnegative Matrix and Tensor - PowerPoint PPT Presentation

Fast Newton-type Methods for Nonnegative Matrix and Tensor Approximation Inderjit S. Dhillon Department of Computer Sciences The University of Texas at Austin Joint work with Dongmin Kim and Suvrit Sra Outline Introduction 1 2 Nonnegative

Type Checking Grammar Rule Semantic Rule var-decl id : type-exp Insert (id.name, type-exp .

NEWTON SEPAC End of Year Report to Newton School Committee June 10, 2019 Newton SEPAC Co-Chairs

Nonnegative matrix factorization and applications in audio signal processing C edric F

Worldwide Newton Conference Paris, September 2004 eBook composition on the Newton MessagePad 2100

Images of Isaac Newton 1 Portrait of Isaac Newton, Godfrey Kneller, 1689 This image is in the

Newton Methods for Neural Networks: Gauss Newton Matrix-vector Product Chih-Jen Lin National

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Quasi-Newton methods for minimization Lectures for PHD course on Non-linear equations and

Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico

NEWTON EARLY CHILDHOOD PROGRAM STAFF PRESENTATION NEWTON, MA 14 JANUARY 2020 SCHEDULE OVERVIEW

Newton never dies It only gets new hardware Paul Guyot Worldwide Newton Conference 2004

SIR ISAAC NEWTON (1642-1727) Born in the small village of Woolsthorpe, Newton quickly made an

Faces Introduction/Problem Statement Tell me this is Newton Dont tell me this is Newton

Directed Algebraic Topology Scott Newton PhD Student, Ohio State University newton.385@osu.edu

Data Sciences CentraleSupelec Advance Machine Learning Course VI - Nonnegative matrix

One Dimensional Non-Linear Problems Lectures for PHD course on Numerical optimization Enrico

Root-finding and Optimization Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on parts of:

Neural Networks: Optimization Part 1 Intro to Deep Learning, Fall 2017 1 Story so far

Global Calls for Economic Justice: the potential of Islamic finance Mukhtar Hussain Justice

Rearrangements of numerical series Marion Scheepers October 13, 2011 Marion Scheepers

Leveraging time integration to increase efficiency and robustness of nonlinear implicit solvers

State-dependent Foster-Lyapunov criteria Stephen Connor stephen.connor@york.ac.uk Joint work

Solving large scale eigenvalue problems Lecture 6, March 28, 2018: Simple vector iterations