Robust Principal Component Pursuit via Alternating Minimization - PowerPoint PPT Presentation

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof. Michael Hinterm¨ uller tao.wu@uni-graz.at Riem-RPCP (1/19)

Low-rank paradigm. Low-rank matrices arise in one way or another: ◮ low-degree statistical processes � e.g. collaborative filtering, latent semantic indexing. ◮ regularization on complex objects � e.g. manifold learning, metric learning. ◮ approximation of compact operators � e.g. proper orthogonal decomposition. Fig.: Collaborative filtering (courtesy of wikipedia.org). tao.wu@uni-graz.at Riem-RPCP (2/19)

Robust principal component pursuit. ◮ Sparse component corresponds to pattern-irrelevant outliers. ◮ Robustifies classical principal component analysis. ◮ Carries important information in certain applications; e.g. moving objects in surveillance video. ◮ Robust principal component pursuit: data low-rank sparse noise Z = A + B + N ◮ Introduced in [Cand´ es, Li, Ma, and Wright, ’11], [Chandrasekaran, Sanghavi, Parrilo, and Willsky, ’11]. tao.wu@uni-graz.at Riem-RPCP (3/19)

Convex-relaxation approach. ◮ A popular (convex) variational model: min � A � nuclear + λ � B � ℓ 1 s . t . � A + B − Z � ≤ ε. ◮ Considered in [Cand´ es, Li, Ma, and Wright, ’11], [Chandrasekaran, Sanghavi, Parrilo, and Willsky, ’11], ... ◮ rank( A ) relaxed by nuclear-norm; � B � 0 relaxed by ℓ 1 -norm. ◮ Numerical solvers: proximal gradient method, augmented Lagrangian method, ... � Efficiency is constrained by SVD in full dimension at each iteration. tao.wu@uni-graz.at Riem-RPCP (4/19)

Manifold constrained least-squares model. ◮ Our variational model: min 1 2 � A + B − Z � 2 s . t . A ∈ M ( r ) := { A ∈ R m × n : rank( A ) ≤ r } , B ∈ N ( s ) := { B ∈ R m × n : � B � 0 ≤ s } . ◮ Our goal is to develop an algorithm such that: ◮ globally converges to a stationary point (often a local minimizer). ◮ provides exact decomposition with high probability for noiseless data. ◮ outperforms solvers based on convex-relaxation approach, especially in large scales. tao.wu@uni-graz.at Riem-RPCP (5/19)

Existence of solution and optimality condition. ◮ A little quadratic regularization ( 0 < µ ≪ 1 ) is included for the (theoretical) sake of existence of a solution; i.e. min f ( A, B ) := 1 2 � A + B − Z � 2 + µ 2 � A � 2 , s . t . ( A, B ) ∈ M ( r ) × N ( s ) . In numerics, choosing µ = 0 seems fine. ◮ Stationarity condition as variational inequalities: � � ∆ , (1 + µ ) A ∗ + B ∗ − Z � ≥ 0 , for any ∆ ∈ T M ( r ) ( A ∗ ) , � ∆ , A ∗ + B ∗ − Z � ≥ 0 , for any ∆ ∈ T N ( s ) ( B ∗ ) . T M ( r ) ( A ∗ ) and T N ( s ) ( B ∗ ) refer to tangent cones. tao.wu@uni-graz.at Riem-RPCP (6/19)

Constraints of Riemannian manifolds. ◮ M ( r ) is Riemannian manifold around A ∗ if rank( A ∗ ) = r ; N ( s ) is Riemannian manifold around B ∗ if � B ∗ � 0 = s . ◮ Optimality condition reduces to: � P T M ( r ) ( A ∗ ) ((1 + µ ) A ∗ + B ∗ − Z ) = 0 , P T N ( s ) ( B ∗ ) ( A ∗ + B ∗ − Z ) = 0 . P T M ( r ) ( A ∗ ) and P T N ( s ) ( B ∗ ) are orthogonal projections onto subspaces. ◮ Tangent space formulae: T M ( r ) ( A ∗ ) = { UMV ⊤ + U p V ⊤ + UV ⊤ p : A ∗ = U Σ V ⊤ as compact SVD , M ∈ R r × r , U p ∈ R m × r , U ⊤ p U = 0 , V p ∈ R n × r , V ⊤ p V = 0 } , T N ( s ) ( B ∗ ) = { ∆ ∈ R m × n : supp(∆) ⊂ supp( B ∗ ) } . tao.wu@uni-graz.at Riem-RPCP (7/19)

A conceptual alternating minimization scheme. Initialize A 0 ∈ M ( r ) , B 0 ∈ N ( s ) . Set k := 0 and iterate: 1. A k +1 ≈ arg min A ∈M ( r ) 2 � A + B k − Z � 2 + µ 1 2 � A � 2 . 2. B k +1 ≈ arg min B ∈N ( s ) 2 � A k +1 + B − Z � 2 . 1 Theorem (sufficient descrease + stationarity ⇒ convergence) Let { ( A k , B k ) } be generated as above. Suppose that there exists δ > 0 , ε k a ↓ 0 , and ε k b ↓ 0 such that for all k : f ( A k +1 , B k ) ≤ f ( A k , B k ) − δ � A k +1 − A k � 2 , f ( A k +1 , B k +1 ) ≤ f ( A k +1 , B k ) − δ � B k +1 − B k � 2 , � ∆ , (1 + µ ) A k +1 + B k − Z � ≥ − ε k for any ∆ ∈ T M ( r ) ( A k +1 ) , a � ∆ � , � ∆ , A k +1 + B k +1 − Z � ≥ − ε k for any ∆ ∈ T N ( s ) ( B k +1 ) . b � ∆ � , Then any non-degenerate limit point ( A ∗ , B ∗ ) , i.e. rank( A ∗ ) = r and � B ∗ � 0 = s , satisfies the first-order optimality condition. tao.wu@uni-graz.at Riem-RPCP (8/19)

Sparse matrix subproblem. ◮ The global solution P N ( s ) ( Z − A k +1 ) (as metric projection) can be efficiently calculated from “sorting”. ◮ The global solution may not necessarily fulfill the sufficient descrease condition. ◮ Whenever necessary, safeguard by a local solution: � ( Z − A k +1 ) ij , if B k ij � = 0 , B k +1 = ij 0 , otherwise . ◮ Given non-degeneracy of B k +1 , i.e. � B k +1 � 0 = s , the exact stationarity holds. tao.wu@uni-graz.at Riem-RPCP (9/19)

Low-rank matrix subproblem: a Riemannian perspective. 1 ◮ Global solution P M ( r ) ( 1+ µ ( Z − B k )) as metric projection: ◮ available due to Eckart-Young theorem; i.e. n r 1 1 1 + µ ( Z − B k ) = � σ j u j v ⊤ 1 + µ ( Z − B k )) = � σ j u j v ⊤ ⇒ P M ( r ) ( j . j j =1 j =1 ◮ but requires SVD in full dimension � expensive for large-scale problems (e.g. m, n ≥ 2000 ). ◮ Alternatively resolved by a single Riemannian optimization step on matrix manifold. ◮ Riemannian optimization applied to low-rank matrix/tensor problems; see [Simonsson and Eld´ en, ’10], [Savas and Lim, ’10], [Vandereycken, ’13], ... ◮ Our goal: The subproblem solver should activate the convergence criteria, i.e. sufficient descrease + stationarity. tao.wu@uni-graz.at Riem-RPCP (10/19)

Riemannian optimization: an overview. R f M ◮ References: [Smith, ’93], [Edelman, Arias, and Smith, ’98], [Absil, Mahony, and Sepulchre, ’08], ... ◮ Why Riemannian optimization? ◮ Local homeomorphism is computationally infeasible/expensive. ◮ Intrinsically low dimensionality of the underlying manifold. ◮ Further dimension reduction via quotient manifold. ◮ Typical Riemannian manifolds in applications: ◮ finite-dimensional (matrix manifold): Stiefel manifold, Grassmann manifold, fixed-rank matrix manifold, ... ◮ infinite-dimensional: shape/curve spaces, ... tao.wu@uni-graz.at Riem-RPCP (11/19)

00˙AMS September 23, 2007 Riemannian optimization: a conceptual algorithm. T ¯ M ( r ) ( A k ) T x M x A k ∆ k ξ R x ( ξ ) M ( r ) ( A k , ∆ k ) retract ¯ ¯ M ( r ) M At the current iterate: 1. Build a quadratic model in the tangent space using Riemannian gradient and Riemannian Hessian. 2. Based on the quadratic model, build a tangential search path. 3. Perform backtracking path search via retraction to determine the step size. 4. Generate the next iterate. tao.wu@uni-graz.at Riem-RPCP (12/19)

Riemannian gradient and Hessian. ¯ A : A ∈ ¯ M ( r ) := { A : rank( A ) = r } ; f k M ( r ) �→ f ( A, B k ) . ◮ ◮ Riemannian gradient, grad f k A ( A ) ∈ T ¯ M ( r ) ( A ) , is defined s.t. � grad f k A ( A ) , ∆ � = Df k A ( A )[∆] , ∀ ∆ ∈ T ¯ M ( r ) ( A ) . grad f k M ( r ) ( A ) ( ∇ f k A ( A ) = P T ¯ A ( A )) . ◮ Riemannian Hessian, Hess f k A ( A ) : T ¯ M ( r ) ( A ) → T ¯ M ( r ) ( A ) , is defined s.t. Hess f k A ( A )[∆] = ∇ ∆ grad f k A ( A ) , ∀ ∆ ∈ T ¯ M ( A ) . Hess f k A ( A )[∆] = ( I − UU ⊤ ) ∇ f k A ( A )( I − V V ⊤ )∆ ⊤ U Σ − 1 V ⊤ + U Σ − 1 V ⊤ ∆ ⊤ ( I − UU ⊤ ) ∇ f k A ( A )( I − V V ⊤ ) + (1 + µ )∆ . See, e.g., [Vandereycken, ’12]. tao.wu@uni-graz.at Riem-RPCP (13/19)

00˙AMS September 23, 2007 Dogleg search path and projective retraction. M ( r ) ( A k ) T ¯ T x M Trust region x A k p ( ) Optimal trajectory ∆( σ ) ∆ k ξ ∆ R x ( ξ ) M ( r ) ( A k , ∆ k ) retract ¯ ∆ N p B ( full step ) M ( r ) ¯ ∆ C p U ( unconstrained min along — g ) — g M dogleg path ◮ “Dogleg” path ∆ k ( τ k ) as approximation of optimal trajectory of tangential trust-region subproblem (left figure): A ( A k ) + � g k , ∆ � + 1 min f k 2 � ∆ , H k [∆] � M ( r ) ( A k ) , � ∆ � ≤ σ. s . t . ∆ ∈ T ¯ ◮ Metric projection as retraction (right figure): M ( r ) ( A k + ∆ k ( τ k )) . M ( r ) ( A k , ∆ k ( τ k )) = P ¯ retract ¯ Computationally efficient: “reduced” SVD on 2 r -by- 2 r matrix! tao.wu@uni-graz.at Riem-RPCP (14/19)

Low-rank matrix subproblem: projected dogleg step. Given A k ∈ ¯ M ( r ) , B k ∈ N ( s ) : 1. Compute g k , H k , and build the dogleg search path ∆ k ( τ k ) in M ( r ) ( A k ) . T ¯ 2. Whenever non-positive definiteness of H k is detected, replace the dogleg search path by the line search path along steepest descent direction, i.e. ∆( τ k ) = − τ k g k . 3. Perform backtracking path/line search; i.e. find the largest step size τ k ∈ { 2 , 3 / 2 , 1 , 1 / 2 , 1 / 4 , 1 / 8 , ... } s.t. the sufficient descrease condition is satisfied: f k A ( A k ) − f k M ( r ) ( A k +∆ k ( τ k ))) ≥ δ � A k − P ¯ M ( r ) ( A k +∆ k ( τ k )) � 2 . A ( P ¯ 4. Return A k +1 = f k M ( r ) ( A k + ∆ k ( τ k ))) . A ( P ¯ tao.wu@uni-graz.at Riem-RPCP (15/19)

Robust Principal Component Pursuit via Alternating Minimization - PowerPoint PPT Presentation

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof. Michael Hinterm uller

A.C. generates an alternating field Alternating field generates eddy currents in

Alternating Permutations Richard P. Stanley M.I.T. Alternating Permutations p. 1

Alternating Permutations Richard P. Stanley M.I.T. Alternating Permutations p. Basic

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Alternating Current Slide 2 / 69 Topics to be covered Sources of alternating EMF Transformers

Alternating Permutations Richard P. Stanley M.I.T. Alternating Permutations p. 1 Basic

Unit 10: Alternating-current circuits Introduction. Alternating current features. Phasor

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Alternating-time temporal logic Mehdi Dastani BBL-521 M.M.Dastani@uu.nl ATL: Alternating-time

Alternating offers bargaining with risk of breakdown Julio D avila 2009 Julio D avila

Two-Way Alternating Automata and Finite Models Tedious proofs of irrelevant results Mikolaj

B uchi Complementation via Alternating Automata Fabian Reiter July 16, 2012 B uchi

Functional components Notification component Application received Refuse ? Notification

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

Non-Intrusive Solution of Stochastic and Parametric Equations Hermann G. Matthies a c Giraldi b

19 FEM Convergence Requirements IFEM Ch 19 Slide 1 Department of Engineering Mechanics

Metric number theory, lacunary series and systems of dilated functions Christoph Aistleitner

SSA Static Single Assignment (SSA) was developed by R. Cytron, J. Ferrante, et al. in the 1980s.

Thresholds for entanglement criteria in quantum information theory Joint results with Nicolae

Lecture 27 Dialogue and Conversational Agents Julia Hockenmaier juliahmr@illinois.edu 3324

David DeVault University of Southern California Adjunct Research Assistant Professor Ementive

Learning to Ask k Questions in Open- do domain main Conversatio tional nal Systems ms with