A 2-phase augmented Lagrangian approach for large scale matrix - PowerPoint PPT Presentation

A 2-phase augmented Lagrangian approach for large scale matrix optimization Defeng Sun Department of Mathematics, National University of Singapore September 5, 2014 (Presentation at 2014 Workshop on Optimization for Modern Computation, Beijing University) Joint work with: Kim-Chuan Toh, National University of Singapore Students/postdocs: Caihua Chen (Nanjing), Junfeng Yang (Nanjing), Chao Ding (CAS), Kaifeng Jiang (DBS), Yongjin Liu (Shengyang Aerospace U), Chengjing Wang (Southwest Jiaotong U), Liuqin Yang (NUS), Xudong Li (NUS), Xinyuan Zhao (Beijing U Tech.) 1

Outline Matrix optimization problem (MOP) Examples: linear semidefinite programming (SDP), etc General framework of proximal-point algorithm (PPA) 2-phase PPA applied to SDP and SDP+ (matrix variable is positive semidefinite and nonnegative) A majorized semismooth Newton-CG (SNCG) method for solving PPA subproblems SDPNAL+: practical implementation of PPA for SDP+ Numerical experiments 2

Convex conic matrix optimization X = R p × n or S n ( n × n symmetric matrices) endowed trace inner product �· , ·� and Frobenius norm � · � � � (MOP) min f ( X ) | A ( X ) − b ∈ Q , X ∈ X f : X → ( −∞ , ∞ ] is a proper closed convex function Q is a closed convex cone in R m b ∈ R m A : X → R m is a given (onto) linear map, e.g., A ( X ) = diag( X ) Define A ∗ = the adjoint of A Define the dual cone Q ∗ = { X ∈ X | � Y, X � ≥ 0 ∀ Y ∈ Q} . Define � 0 if X ∈ Q (indicator function) δ Q ( X ) = ∞ otherwise 3

Dual of MOP Define f ∗ ( Z ) {� Z, X � − f ( X ) } (conjugate function) = sup X ∈X (subdifferential) ∂f ( X ) = conv { subgradients of f at X } The dual problem of (MOP) is given by y ∈Q ∗ � b, y � − f ∗ ( A ∗ y ) max The KKT conditions for (MOP) are: y ∈ Q ∗ , A ∗ y ∈ ∂f ( X ) A X − b ∈ Q , 4

MOP covers many important classes of problems S n + = cone of positive semidefinite matrices. Write X � 0 if X ∈ S n + . MOP includes linear semidefinite programming (SDP): � � � C, X � | A ( X ) = b, X ∈ S n (SDP) min + � + ( X ) | A ( X ) − b ∈ Q := { 0 } m � f ( X ) := � C, X � + δ S n = min � if X ∈ S n 0 + + ( X ) = indicator function of S n δ S n + = ∞ otherwise SDP is solvable by powerful interior-point methods if n and m are not too large, say, n ≤ 2 , 000 , m ≤ 10 , 000 . Current research interests focus on n ≤ 10 , 000 but m ≫ 10 , 000 . 5

SDP and MOP have lots of Applications SDP (and more generally MOP) is a powerful modelling tool! Appli- cations are growing rapidly, and driving developments in algorithms and software. LMI in control Combinatorial optimization Robust optimization: project management, revenue management Polynomial optimization: option pricing, queueing systems Moment problems, applied probability Engineering: Signal processing, communication, structural optimization, computer vision Statistics/Finance: correlation/covariance matrix estimation Machine learning: kernel estimation, dimensionality reduction/manifold unfolding, Euclidean metric embedding: sensor network localization, molecular conformation Quantum chemistry, quantum information Many others ... 6

Maximum stable set problem a graph G = ( V, E ) A stable set S is subset of V such that no vertices in S are adjacent. Maximum stable set problem: find S with maximum cardinality. Let � 1 � n if i ∈ S x i = ⇒ | S | = x i . 0 otherwise i =1 A common formulation of the max-stable-set problem: � � 1 ij x i x j | x i x j = 0 ∀ ( i, j ) ∈ E , x ∈ { 0 , 1 } n | S | = α ( G ) := max | S | X := xx T / | S | ⇓ � � max � E, X � | X ij = 0 ∀ ( i, j ) ∈ E , � I, X � = 1 SDP relaxation: X = xx T / | S | ⇒ X � 0 , get � � θ ( G ) := max � E, X � : X ij = 0 ∀ ( i, j ) ∈ E , � I, X � = 1 , X � 0 θ + ( G ) := n ( n + 1) / 2 additional constraints X ≥ 0 7

Quadratic assignment problem (QAP) Assign n facilities to n locations [Koopmans and Beckmann (1957)] A = ( a ij ) where a ij = flow from facility i to facility j B = ( b kl ) where b kl = distance from location k to location l cost of assignment π = � n � n j =1 a ij b π ( i ) π ( j ) i =1 � � � B ⊗ A, vec( P )vec( P ) T � | P is n × n permutation matrix min P SDP+ relaxation [Povh and Rendl, 09]: relax vec( P )vec( P ) T to the n 2 × n 2 variable X ∈ S n 2 + and X ≥ 0 � � � B ⊗ A, X � | A ( X ) − b = 0 , X ∈ S n 2 (QAP) min + , X ≥ 0 where the linear constraints (with m = 3 n ( n + 1) / 2 ) encode the condition P T P = I n , P ≥ 0 . 8

Relaxations of rank-1 tensor approximations Consider symmetric 4-tensor [Nie, Lasserre, Lim, De Lathauwer et al]: � F ijkl x i x j x k x l → F ≈ λ ( u ⊗ u ⊗ u ⊗ u ) f ( x ) = 1 ≤ i,j,k,l ≤ n for some scalar λ and u ∈ R n with � u � = 1 . Need to solve: max x ∈ R n {± f ( x ) | g ( x ) := x 2 1 + · · · + x 2 n = 1 } . Let [ x ] d = monomial vector of degree at most d � � A α x α ⇒ M d ( y ) := [ x ] d [ x ] T = A α y α d | α |≤ 2 d α � f α x α ⇒ � f, y � f ( x ) = � g α x α ⇒ � g, y � g ( x ) = SDP relaxation is given by: max {� f, y � | � g, y � = 1 , M d ( y ) � 0 } Relaxation is tight if rank( M d ( y ∗ ) )=1. 9

Molecular conformation and sensor localization Given sparse and noisy distance data { d ij | ( i, j ) ∈ E} for n atoms, find coordinates v 1 , . . . , v n in R 3 such that � v i − v j � ≈ d ij . Typically E consists of 20–50% of all pairs of atoms which are ≤ 6 ˚ A apart. Consider the model: �� ij | | v 1 , . . . , v n ∈ R 3 � ( ij ) ∈E |� v i − v j � 2 − d 2 min Let V = [ v 1 , . . . , v n ] and X = V T V. Relaxing X = V T V to X � 0 lead to an SDP: �� ( i,j ) ∈E |� A ij , X � − d 2 min ij | : � E, X � = 0 , X � 0 X where A ij = e i e T i + e j e T j − e i e T j − e j e T i 10

Protein molecule 1PTQ from Protein Data Bank: number of atoms n = 402 number of pairwise distances given |E| ≈ 3700 (50% of distances ≤ 6 ˚ A ≈ 4.5% of all pairwise distances) Actual Reconstructed 11

Nuclear norm minimization problem Given a partially observed matrix of M ∈ R n × n , find a min-rank matrix Y ∈ R n × n to complete M : � � min rank ( Y ) | Y ij = M ij ∀ ( i, j ) ∈ E (NP-hard) Y ∈ R n × n [Candes, Parrilo, Recht, Tao,...] For a given rank- r matrix M ∈ R n × n that satisfies certain properties, if enough entries ( ∝ r n polylog( n ) ) are sampled randomly, then with very high probability, M can be recovered from the following nuclear norm minimization problem: � � easier problem, but still � Y � ∗ | Y ij = M ij ∀ ( i, j ) ∈ E min nontrivial to solve ! Y ∈ R n × n where � Y � ∗ = sum of singular values of Y . 12

Based on partially observed matrix, predict unobserved entries: will customer i like movie j ? movies 2 1 4 5 5 4 ? 1 3 3 5 2 4 ? 5 3 ? 4 1 3 5 2 1 ? 4 1 5 5 4 users 2 ? 5 ? 4 3 3 1 5 2 1 3 1 2 3 4 5 1 3 3 3 ? 5 2 ? 1 1 5 2 ? 4 4 1 3 1 5 4 5 1 2 4 5 ?

Sparse covariance selection problems Given i.i.d. observations drawn from an n -dimensional Gaussian dis- tribution N ( x, µ, Σ) , let � Σ be the sample covariance matrix. Want to estimate Σ , whose inverse X := Σ − 1 is sparse. Dempster (1972) proved that x i and x j are conditionally inde- pendent (given all other x k ) if and only if X ij = 0 . Typically, we estimate X via the log-likelihood function: � � log det X − � � max Σ , X � − � W, | X |� | X ≻ 0 where the weighted L 1 -term is added to encourage sparsity in X . Many papers: d’Aspremont, M. Yuan, Lu, Meinshausen, B¨ uhlmann, Wang-Sun-Toh, Yang-Sun-Toh 14

Convex quadratic SDP (MOP) also contains the important case of convex quadratic SDP: � 1 � 2 � X, Q ( X ) � + � C, X � | A ( X ) − b = 0 , X ∈ S n (QSDP) min + X ∈S n Q : S n → S n is a self-adjoint positive semidefinite linear operator. A well-studied example is the nearest correlation matrix problem, where given data matrix U ∈ S n and weight matrix W ≻ 0 , we want to solve the W -weighted NCM problem: � 1 � 2 � W ( X − U ) W � 2 | Diag( X ) = 1 , X � 0 (W-NCM) min . X 1 The alternating projection method [Higham 02] 2 The quasi-Newton method [Malick 04] 3 An inexact semismooth Newton-CG method [Qi and Sun 06] 4 An inexact interior-point method [Toh, T¨ ut¨ unc¨ u and Todd 07] 15

H -weighted NCM problem � 1 � 2 � H ◦ ( X − U ) � 2 | Diag( X ) = 1 , X � 0 (H-NCM) min X where H ∈ S n has nonnegative entries and “ ◦ ” denotes the Hardamard product. 1 An inexact IPM for convex QSDP [Toh 08] 2 An ALM [Qi and Sun 10] 3 A semismooth Newton-CG ALM for convex quadratic programming over symmetric cones [Zhao 09] 4 A modified alternating direction method for convex quadratically constrained QSDPs [J. Sun and Zhang 10] 16

A 2-phase augmented Lagrangian approach for large scale matrix - PowerPoint PPT Presentation

A 2-phase augmented Lagrangian approach for large scale matrix optimization Defeng Sun Department of Mathematics, National University of Singapore September 5, 2014 (Presentation at 2014 Workshop on Optimization for Modern Computation, Beijing

PRACTICAL AUGMENTED LAGRANGIAN METHODS FOR NONCONVEX PROBLEMS Jos e Mario Mart nez

An augmented Lagrangian Approach for the defocusing non-linear Schr odinger Equation Firas

Network performance requirements of Augmented Reality Systems Mike P. Wittie 1 Augmented

Today Lagrangian Dual. Already saw example! Convex Separator. Farkas Lemma. Lagrangian Dual.

IMPACT OF AUGMENTED REALITY ON SOCIETY BY DEREK MANDL AND STEPHEN SLADEK WHAT IS AUGMENTED

A Lagrangian strategy for in situ sampling of the physical-biological A Lagrangian strategy for in

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &

On Augmented Lagrangian approach for inverse problems Adriano De Cezaro- FURG in collaboration

Distributed nonsmooth composite optimization via the proximal augmented Lagrangian Neil K.

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

1/08/2012 Augmented Reality How Does This Technology Fit in the Commercial World? Augmented

Portfolio of Work (9 pages) T H E N E X T R E V O L U T I O N I N R E T A I L AUGMENTED

ubiquitous computing and augmented realities virtual and augmented reality m aking the

Self-Organization in Autonomous Sensor/Actuator Networks [SelfOrg] Dr.-Ing. Falko Dressler

Collaborative Editing with PGIP A Fresh Look at the PGIP Display Protocol David Aspinall,

GPCO 453: Quantitative Methods I Sec 02: Time Preferences Shane Xinyang Xuan 1 ShaneXuan.com

Course Announcements Marks for Assignment0: soon to be posted Assignment1: to be posted RSN If

On the maximal perimeter of sections of the cube Hermann Knig Kiel, Germany Jena, September

The Geography of NGO Activism towards Multinational Corporations Sophie Hatte 1 and Pamina Koenig

Neutrino Fingerprints in Compact Objects Irene Tamborra Niels Bohr Institute, University of

Time Series Regression A regression model relates a response x t to inputs z t, 1 , z t, 2 , .

A 2-phase augmented Lagrangian approach for large scale matrix - PowerPoint PPT Presentation

A 2-phase augmented Lagrangian approach for large scale matrix optimization Defeng Sun Department of Mathematics, National University of Singapore September 5, 2014 (Presentation at 2014 Workshop on Optimization for Modern Computation, Beijing

PRACTICAL AUGMENTED LAGRANGIAN METHODS FOR NONCONVEX PROBLEMS Jos e Mario Mart nez

An augmented Lagrangian Approach for the defocusing non-linear Schr odinger Equation Firas

Network performance requirements of Augmented Reality Systems Mike P. Wittie 1 Augmented

Today Lagrangian Dual. Already saw example! Convex Separator. Farkas Lemma. Lagrangian Dual.

IMPACT OF AUGMENTED REALITY ON SOCIETY BY DEREK MANDL AND STEPHEN SLADEK WHAT IS AUGMENTED

A Lagrangian strategy for in situ sampling of the physical-biological A Lagrangian strategy for in

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &amp;

On Augmented Lagrangian approach for inverse problems Adriano De Cezaro- FURG in collaboration

Distributed nonsmooth composite optimization via the proximal augmented Lagrangian Neil K.

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

1/08/2012 Augmented Reality How Does This Technology Fit in the Commercial World? Augmented

Portfolio of Work (9 pages) T H E N E X T R E V O L U T I O N I N R E T A I L AUGMENTED

ubiquitous computing and augmented realities virtual and augmented reality m aking the

Self-Organization in Autonomous Sensor/Actuator Networks [SelfOrg] Dr.-Ing. Falko Dressler

Collaborative Editing with PGIP A Fresh Look at the PGIP Display Protocol David Aspinall,

GPCO 453: Quantitative Methods I Sec 02: Time Preferences Shane Xinyang Xuan 1 ShaneXuan.com

Course Announcements Marks for Assignment0: soon to be posted Assignment1: to be posted RSN If

On the maximal perimeter of sections of the cube Hermann Knig Kiel, Germany Jena, September

The Geography of NGO Activism towards Multinational Corporations Sophie Hatte 1 and Pamina Koenig

Neutrino Fingerprints in Compact Objects Irene Tamborra Niels Bohr Institute, University of

Time Series Regression A regression model relates a response x t to inputs z t, 1 , z t, 2 , .

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &