nonconvex phase retrieval with random gaussian
play

Nonconvex Phase Retrieval with Random Gaussian Measurements Yuejie - PowerPoint PPT Presentation

Nonconvex Phase Retrieval with Random Gaussian Measurements Yuejie Chi Department of Electrical and Computer Engineering December 2017 CSA, Berlin Acknowledgements Primary Collaborators: Yingbin Liang (OSU), Yuanxin Li (OSU), Yi Zhou


  1. Nonconvex Phase Retrieval with Random Gaussian Measurements Yuejie Chi Department of Electrical and Computer Engineering December 2017 CSA, Berlin

  2. Acknowledgements • Primary Collaborators: Yingbin Liang (OSU), Yuanxin Li (OSU), Yi Zhou (OSU), Huishuai Zhang (MSRA), Yuxin Chen (Princeton), Cong Ma (Princeton), and Kaizheng Wang (Princeton). • This research is supported by AFOSR, ONR and NSF. 1

  3. Phase Retrieval: The Missing Phase Problem • In high-frequency (e.g. optical) applications, the (optical) detection devices [e.g., CCD cameras, photosensitive films, and the human eye] cannot measure the phase of a light wave. ω 0 10 ω 0 100 ω 0 • Optical devices measure the photon flux (no. of photons per second per unit area), which is proportional to the magnitude. • This leads to the so-called phase retrieval problem — inference with only intensity measurements. 2

  4. Computational Imaging Phase retrieval is the foundation for modern computational imaging. Terahertz Imaging Ankylography Ptychography Space Telescope 3

  5. Mathematical Setup • Phase retrieval: estimate x ♮ ∈ R n / C n from m phaseless measurements: y i = |� a i , x ♮ �| , i = 1 , . . . , m where a i corresponds to the i th measurement vector. • a i ’s are (coded or oversampled) Fourier transform vectors; • a i ’s are short-time Fourier transform vectors; • a i ’s are “generic” vectors such as random Gaussian vectors. • In a vectorized notation, we write  − a ∗  1 − − a ∗ 2 −   y = | Ax ♮ | ∈ R n  ∈ R / C m × n . + , where A = .   .   .  − a ∗ m − • Phase retrieval solves a quadratic nonlinear system since: i = |� a i , x ♮ �| 2 = ( x ♮ ) ∗ a i a ∗ y 2 i x ♮ , i = 1 , . . . , m, 4

  6. Nonconvex Procedure m 1 � x = argmin ˆ ℓ ( y i ; x ) m x ∈ R n / C n i =1 • Initialize x 0 via spectral methods to land in the neighborhood of the ground truth; • Iterative update using simple methods such as gradient descent and alternating minimization; Figure credit: Yuxin Chen. 5

  7. Quadratic Loss of Amplitudes Wirtinger Flow (WF) employs the intensity-based loss surface [Cand` es et.al.]: m ℓ W F ( x ) = 1 |� a i , x ♮ �| 2 − |� a i , x �| 2 � 2 , � � m i =1 which is nonconvex and smooth . 6

  8. Quadratic Loss of Amplitudes Wirtinger Flow (WF) employs the intensity-based loss surface [Cand` es et.al.]: m ℓ W F ( x ) = 1 |� a i , x ♮ �| 2 − |� a i , x �| 2 � 2 , � � m i =1 which is nonconvex and smooth . Reshaped Wirtinger Flow (RWF) : In contrast, we propose to minimize the quadratic loss of amplitude measurements: ℓ ( x ) := 1 m � y − | Ax |� 2 2 m m = 1 ℓ ( y i ; x ) = 1 � 2 , � � |� a i , x ♮ �| − |� a i , x �| � m m i =1 i =1 which is nonconvex and nonsmooth . 6

  9. Quadratic Loss of Amplitudes Wirtinger Flow (WF) employs the intensity-based loss surface [Cand` es et.al.]: m ℓ W F ( x ) = 1 |� a i , x ♮ �| 2 − |� a i , x �| 2 � 2 , � � m i =1 which is nonconvex and smooth . Reshaped Wirtinger Flow (RWF) : In contrast, we propose to minimize the quadratic loss of amplitude measurements: ℓ ( x ) := 1 m � y − | Ax |� 2 2 m m = 1 ℓ ( y i ; x ) = 1 � 2 , � � |� a i , x ♮ �| − |� a i , x �| � m m i =1 i =1 which is nonconvex and nonsmooth . Which one is better? 6

  10. The Choice of Loss Function is Important The quadratic loss of amplitudes enjoy better curvature in expectation! 10 5 180 9 4.5 160 8 4 140 7 3.5 120 6 3 100 5 2.5 80 4 2 60 3 1.5 40 2 1 0.5 20 1 0 0 0 2 1 2 2 1 2 2 1 0 0 1 2 0 -1 -1 0 1 0 -1 -1 0 1 -1 -2-2 -1 -2-2 -2-2 (a) Expected loss of LS (b) ℓ ( x ) (c) ℓ W F ( x ) Figure: Surface of the expected loss function of (a) least-squares (mirrored symmetrically), (b) quadratic loss of amplitudes, and (c) quadratic loss of intensity when x = [1 , − 1] T . In fact, Error Reduction (ER) , proposed by Gerchberg and Saxton in 1972, can be interpreted as alternating minimization using ℓ ( x ) . 7

  11. Gradient Descent Reshaped Wirtinger Flow (RWF) : • The generalized gradient of ℓ ( x ) can be calculated as m ∇ ℓ ( x ) = 1 � ( � a i , x � − y i · sign ( � a i , x � )) a i m i =1 • Start with an initialization x 0 . At iteration t = 0 , 1 , . . . x t +1 = x t − µ ∇ ℓ ( x t ) I − µ x t + µ � � m A ∗ A m A ∗ diag ( y ) sign ( Ax t ) , = where µ is the step size. • Stochastic versions are even faster. 8

  12. Statistical Measurement Model Strong performance guarantees are possible by leverage statistical properties of the measurement ensemble. • Gaussian measurement model: a i ∼ N ( 0 , I ) i.i.d. if real-valued , a i ∼ CN ( 0 , I ) i.i.d. if complex-valued , • Distance measure: φ ∈ [0 , 2 π ) � x − e jφ z � . dist ( x , z ) = min x z 9

  13. Linear Convergence of RWF Theorem (Zhang, Zhou, Liang, C., 2016) Under i.i.d. Gaussian design, RWF achieves � t � x ♮ � 2 ( linear convergence) 1 − µ • � x t − x ♮ � 2 � � 2 provided that step size µ = O (1) is small enough and sample size m � n . 10

  14. Linear Convergence of RWF Theorem (Zhang, Zhou, Liang, C., 2016) Under i.i.d. Gaussian design, RWF achieves � t � x ♮ � 2 ( linear convergence) 1 − µ • � x t − x ♮ � 2 � � 2 provided that step size µ = O (1) is small enough and sample size m � n . loss function regularization step size sample size WF intensity-based no O (1 /n ) O ( n log n ) RWF amplitude-based no O (1) O ( n ) TWF intensity-based truncation O (1) O ( n ) WF can be improved by designing a better loss function or introducing proper regularization. But is it really that bad? Zhang, Zhou, Liang and C. , “Reshaped Wirtinger Flow and Incremental Algorithms for solving Quadratic Systems of Equations”, Journal of Machine Learning Research , to appear. 10

  15. Another look at WF • The local Hessian of WF loss satisfies w.h.p. when m = O ( n log n ) : 1 2 I � ∇ 2 ℓ W F ( x ) � n I • Implies a stepsize of µ = O (1 /n ) ⇒ O ( n log(1 /ǫ )) iterations to reach ǫ -accuracy. = 11

  16. Another look at WF • The local Hessian of WF loss satisfies w.h.p. when m = O ( n log n ) : 1 2 I � ∇ 2 ℓ W F ( x ) � n I • Implies a stepsize of µ = O (1 /n ) ⇒ O ( n log(1 /ǫ )) iterations to reach ǫ -accuracy. = Numerically, WF can run much more aggressively! ( µ = 0 . 1 ) 11

  17. Gradient descent theory Which region enjoys both strong convexity and smoothness? m ∇ 2 ℓ W F ( x ) = 1 � 2 − � k x ♮ � 2 � � a ⊤ a ⊤ a k a ⊤ � � 3 k x k m k =1 12

  18. Gradient descent theory Which region enjoys both strong convexity and smoothness? m ∇ 2 ℓ W F ( x ) = 1 � 2 − � k x ♮ � 2 � � a ⊤ a ⊤ a k a ⊤ � � 3 k x k m k =1 • Not smooth if x and a k are too close (coherent) 12

  19. Gradient descent theory Which region enjoys both strong convexity and smoothness? · x \ • x is not far away from x ♮ 12

  20. Gradient descent theory Which region enjoys both strong convexity and smoothness? a 1 · x \ r � � a > 1 ( x − x \ ) � � p . log n k x − x \ k 2 � � • x is not far away from x ♮ • x is incoherent w.r.t. sampling vectors (incoherence region) 12

  21. Gradient descent theory Which region enjoys both strong convexity and smoothness? a 2 a 1 · x \ � � � � p k − k r � � � a > 2 ( x − x \ ) � � a > 1 ( x − x \ ) � � p log n . � p . log n k x − x \ k 2 k x − x \ k 2 � � • x is not far away from x ♮ • x is incoherent w.r.t. sampling vectors (incoherence region) 12

  22. A second look at gradient descent theory region of local strong convexity + smoothness • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13

  23. A second look at gradient descent theory region of local strong convexity + smoothness • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13

  24. A second look at gradient descent theory region of local strong convexity + smoothness • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13

  25. A second look at gradient descent theory region of local strong convexity + smoothness • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13

  26. A second look at gradient descent theory region of local strong convexity + smoothness · · • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13

  27. A second look at gradient descent theory region of local strong convexity + smoothness · · • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13

  28. A second look at gradient descent theory region of local strong convexity + smoothness · · • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13

  29. A second look at gradient descent theory region of local strong convexity + smoothness · · • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend