Nonconvex Demixing from Bilinear Measurements Yuanming Shi 1

Outline  Motivations  Blind deconvolution meets blind demixing  T woVignettes:  Implicitly regularized Wirtinger flow  Why nonconvex optimization?  Implicitly regularized Wirtinger flow  Matrix optimization over manifolds  Why manifold optimization?  Riemannian optimization for blind demixing 2

Motivations: Blind deconvolution meets blind demixing 3

Blind deconvolution  In many science and engineering problems, the observed signal can be modeled as: where is the convolution operator is a physical signal of interest  is the impulse response of the sensory system   Applications: astronomy, neuroscience, image processing, computer vision, wireless communications, microscopy data processing,…  Blind deconvolution: estimate and given 4

Image deblurring  Blurred images due to camera shake can be modeled as a convolution of the latent sharp image and a kernel capturing the motion of the camera kernel Fig. credit: Chi natural image How to find the high-resolution image and the blurring kernel simultaneously? 5

Microscopy data analysis  Defects: the electronic structure of the material is contaminated by randomly and sparsely distributed “defects” Doped Graphene Fig. credit: Wright How to determine the locations and characteristic signatures of the defects? 6

Blind demixing  The received measurement consists of the sum of all convolved signals convolutional dictionary learning (multi kernel) low-latency communication for IoT  Applications: IoT, dictionary learning, neural spike sorting,…  Blind demixing: estimate and given 7

Convolutional dictionary learning  The observation signal is the superposition of several convolutions Fig. credit: Wright experiment on synthetic image experiment on microscopy image How to recover multiple kernels and the corresponding activation signals? 8

Low-latency communications for IoT  Packet structure: metadata (preamble (PA) and header (H)) and data long data packet in current wireless systems short data packet in IoT  Proposal: transmitters just send overhead-free signals, and the receiver can still extract the information How to detect data without channel estimation in multi-user environments? 9

Demixing from bilinear model? 10

Bilinear model  Translate into the frequency domain…  Subspace assumptions: and lie in some known low-dimensional subspaces where , and : partial Fourier basis  Demixing from bilinear measurements: 11

An equivalent view: low-rank factorization  Lifting: introduce to linearize constraints  Low-rank matrix optimization problem 12

Convex relaxation  Ling and Strohmer (TIT’2017) proposed to solve the nuclear norm minimization problem: : partial Fourier basis  Sample-efficient: samples for exact recovery if is incoherent w.r.t.  Computational-expensive: SDP in the lifting space Can we solve the nonconvex matrix optimization problem directly? 13 13

Vignettes A: Implicitly regularized Wirtinger flow 14

Why nonconvex optimization? 15

Nonconvex problems are everywhere  Empirical risk minimization is usually nonconvex  low-rank matrix completion  blind deconvolution/demixing  dictionary learning  phase retrieval  mixture models  deep learning  … 16

Nonconvex optimization may be super scary  Challenges: saddle points, local optima, bumps,… Fig. credit: Chen  Fact: they are usually solved on a daily basis via simple algorithms like (stochastic) gradient descent 17

Statistical models come to rescue  Blessings: when data are generated by certain statistical models, problems are often much nicer than worst-case instances Fig. credit: Chen 18

First-order stationary points  Saddle points and local minima: Saddle points/local maxima Local minima 19

First-order stationary points  Applications: PCA, matrix completion, dictionary learning etc.  Local minima: either all local minima are global minima or all local minima as good as global minima  Saddle points: very poor compared to global minima; several such points  Bottomline: local minima much more desirable than saddle points How to escape saddle points efficiently? 20

Statistics meets optimization  Proposal: separation of landscape analysis and generic algorithm design landscape analysis generic algorithms (statistics) (optimization) all the saddle points all local minima are can be escaped global minima dictionary learning (Sun et al. ’15) gradient descent (Lee et al. ’16) • • phase retrieval (Sun et al. ’16) trust region method (Sun et al. ’16) • • matrix completion (Ge et al. ’16) perturbed GD (Jin et al. ’17) • • synchronization (Bandeira et al. ’16) cubic regularization (Agarwal et al. ’17) • • inverting deep neural nets (Hand et al. ’17) Fig. credit: Chen Natasha (Allen-Zhu ’17) • • ... ... • • Issue: conservative computational guarantees for specific problems (e.g., phase retrieval, blind deconvolution, matrix completion) 21

Solution: blending landscape and convergence analysis implicitly regularized Wirtinger flow 22

A natural least-squares formulation  Goal: demixing from bilinear measurements Given:  Pros: computational-efficient in the natural parameter space  Cons: is nonconvex: bilinear constraint, scaling ambiguity 23

Wirtinger flow  Least-square minimization viaWirtinger flow (Candes, Li, Soltanolkotabi ’14)  Spectral initialization by top eigenvector of  Gradient iterations 24

T wo-stage approach  Initialize within local basin sufficiently close to ground-truth (i.e., strongly convex, no saddle points/ local minima)  Iterative refinement via some iterative optimization algorithms Fig. credit: Chen 25

Gradient descent theory  Two standard conditions that enable geometric convergence of GD  (local) restricted strong convexity  (local) smoothness 26

Gradient descent theory  Question: which region enjoys both strong convexity and smoothness? is not far away from (convexity)  is incoherent w.r.t. sampling vectors (incoherence region for smoothness)  Prior works suggest enforcing regularization (e.g., regularized loss [Ling & Strohmer’17]) to promote incoherence 27

Our finding: WF is implicitly regularized  WF (GD) implicitly forces iterates to remain incoherent with  cannot be derived from generic optimization theory  relies on finer statistical analysis for entire trajectory of GD region of local strong convexity and smoothness 28

Key proof idea: leave-one-out analysis  introduce leave-one-out iterates by runningWF without l -th sample  leave-one-out iterate is independent of  leave-one-out iterate true iterate is nearly independent of (i.e., nearly orthogonal to)  29

Theoretical guarantees  With i.i.d. Gaussian design,WF (regularization-free) achieves  Incoherence  Near-linear convergence rate  Summary:  Sample size:  Stepsize: vs. [Ling & Strohmer’17]  Computational complexity: vs. [Ling & Strohmer’17] 30

Numerical results  stepsize:  number of users:  sample size: linear convergence: WF attains - accuracy within iterations 31

Is carefully-designed initialization necessary? 32

Numerical results of randomly initialized WF  stepsize:  number of users:  sample size:  initial point: Randomly initialized WF enters local basin within iterations 33

Analysis: population dynamics Population level (infinite sample)  Signal strength: , is the alignment parameter  Size of residual component:  State evolution local basin 34

Analysis: population dynamics Population level (infinite sample)  Signal strength: , is the alignment parameter  Size of residual component:  State evolution local basin 35

Analysis: finite-sample analysis  Population-level analysis holds approximately if Fig. credit: Chen is well-controlled if is independent of   Key analysis ingredient: show is “nearly independent” of each is well-controlled in this region 36

Theoretical guarantees  With i.i.d. Gaussian design,WF with random initialization achieves Summary:  Stepsize:  Sample size:  Stage I: reach local basin within iterations  Stage II: linear convergence  Computational complexity: 37

Vignettes B: Matrix optimization over manifolds Optimization over Riemannian Manifolds (non-Euclidean geometry) 38

Why manifold optimization? 39

What is manifold optimization?  Manifold (or manifold-constrained) optimization problem is a smooth function  is a Riemannian manifold: spheres, orthonormal bases (Stiefel), rotations,  positive definite matrices, fixed-rank matrices , Euclidean distance matrices, semidefinite fixed-rank matrices, linear subspaces (Grassmann), phases, essential matrices, fixed-rank tensors, Euclidean spaces... 40

Nonconvex Demixing from Bilinear Measurements Yuanming Shi 1 - PowerPoint PPT Presentation

Nonconvex Demixing from Bilinear Measurements Yuanming Shi 1 Outline Motivations Blind deconvolution meets blind demixing T woVignettes: Implicitly regularized Wirtinger flow Why nonconvex optimization? Implicitly

Pairing-Based Cryptography & Generic Groups Lecture 22 Bilinear Pairing Bilinear Pairing

Pairing-Based Cryptography & Generic Groups Lecture 21 Bilinear Pairing Bilinear Pairing

Pairing-Based Cryptography & Generic Groups Lecture 22 1 Bilinear Pairing 2 Bilinear

Joint Blind Deconvolution and Blind Demixing via Nonconvex Optimization Shuyang Ling Department

Abstract rule representations in a Abstract rule representations in a bilinear model bilinear

Weakly-coupled bilinear quantum systems Thomas Chambrion Nabile Boussad (Besanon) and Marco

Rank-penalized nonnegative spatiotemporal deconvolution and demixing of calcium imaging data

Implicit Regularization in Nonconvex Statistical Estimation Yuxin Chen Electrical Engineering,

Stochastic Cubic Regularization for Fast Nonconvex Optimization Nilesh Tripuraneni, Mitchell

PRACTICAL AUGMENTED LAGRANGIAN METHODS FOR NONCONVEX PROBLEMS Jos e Mario Mart nez

Microsticky Microsticky Measurements by Measurements by Measurements by Microsticky

Nonconvex Phase Retrieval with Random Gaussian Measurements Yuejie Chi Department of Electrical

Bilinear Text Regression and Applications Vasileios Lampos Department of Computer Science

Bilinear Models For System Dynamics, . . . The Fact that We Are . . . from System Approach

Poisson algebras of block-upper-triangular bilinear forms and braid group action Marta Mazzocco,

Small Normalized Boolean Circuits for Semi-disjoint Bilinear Forms Require Logarithmic

Encoding QUD congruence in Mandarin Chinese Michael Yoshitaka ERLEWINE National University of

CKY Algorithm, Chomsky Normal Form Scott Farrar CLMA, University of Washington January 13, 2010

Into the blue depths Glass Mysticism in Yevgeny Zamyatins We Luke Jones

THE REACTOR ANTINEUTRINO SPECTRUM M. Fallot 1 1 SUBATECH (CNRS/IN2P3, Ins=tut Mines-Telecom de

An introduction to Orthogonal Time Frequency Space (OTFS) modulation for high mobility

Punggo ggol Gree een n Primar imary y Schoo ool Primary mary One Orient entation ation

Contents of chapter 49: All the sons of Jacob receive blessings as covenanted sons All the

Bachelors Program Bachelors Program Course 101 Required Required Courses: Current