Homotopy Analysis for Tensor PCA Yuan Deng Duke University Joint work with Anima Anandkumar, Rong Ge, Hossein Mobahi
Non-convex Optimization • Optimizing smooth function f(x). Local Optimum Global Optimum How to get rid of local optima
Gaussian Smoothing • Idea: Smooth the function • convolve with 𝒪(0, 𝑢)
Gaussian Smoothing • Idea: Smooth the function • convolve with 𝒪(0, 𝑢)
Gaussian Smoothing • Idea: Smooth the function • convolve with 𝒪(0, 𝑢) Local Minimum Disappears!
Gaussian Smoothing • Idea: Smooth the function • convolve with 𝒪(0, 𝑢) Local Minimum Disappears! Shifted Global Optimum
Gaussian Smoothing • Idea: Smooth the function • convolve with 𝒪(0, 𝑢) Local Minimum Disappears! Shifted Global Optimum • How to decide how much to smooth? • How to recover the original global optium?
Homotopy Method • Try all level of smoothing!
Homotopy Method Computer Vision • image deblurring [Boccuto et al., 2002] • image restoration [Nikolova et al., 2010] • optical flow [Brox & Malik, 2011] Clustering [Gold, 1994] Graph matching [Zaslavskiy et al., 2009] • No theoretical guarantees on the solution • too restrictive [Mobahi and Fisher III, 2015] • difficult to check [Hazan et al., 2016]
Homotopy Method • Handcrafted the choice of smoothing levels • Slow : Local search is repeated for each smoothing level
Tensor PCA [Richard and Montanari 2014] Probabilistic model for PCA 𝑤 ∈ ℝ * , 𝜐 ≥ 0 is the signal-to-noise ratio 𝑁 = 𝜐𝒘𝒘 4 + 𝐵 Gaussian Signal Noise Tensor PCA: 𝑈 = 𝜐𝒘 ⊗ 𝒘 ⊗ 𝒘 + 𝐵 Objective: • Design an efficient algorithm for as small 𝜐 as possible
� Previous Work • [Richard & Montanari 2014] Can find 𝑤 when 𝜐 = Ω 𝑒 in poly time, and 𝜐 = Ω( 𝑒 ) in exp. time. • [Hopkins, Shi & Steurer 2015] Sum-of-Squares technique, 9(𝑒 :/< ) in poly time can find 𝑤 when 𝜐 = Ω • Basic Sum-of-Squares algorithm is very slow. 9 𝑒 : , nearly linear • Running time can be improved Ω
Our Results Bound on 𝜐 Extra Space Method Time 𝑃(𝑒) 9(𝑒 :/< ) 9(𝑒 : ) Ω Ω Ours 𝑃(𝑒 > ) 9(𝑒 :/< ) 9(𝑒 : ) State-of-Art Ω Ω Guarantee matches best known result Better convergence rate when 𝜐 is closed to 𝑒 :/< One of the first results on provably analyzing homotopy method
Optimization for tensor PCA • Recall: for matrix PCA, we optimize max 𝒚 4 𝑁𝒚 = 𝜐 𝒘, 𝒚 > + 𝒚 4 𝐵𝒚 𝒚 = 1 𝑤 • For tensor PCA, we optimize max 𝑈 𝒚, 𝒚, 𝒚 = 𝜐 𝑤, 𝒚 : + 𝐵(𝒚, 𝒚, 𝒚) 𝒚 = 1
Infinite Smoothing 𝑤 unique optimum 𝑦 ∗ : correlation 𝜐 /𝑒 = Ω( 𝑒 FG.>I ) [random unit vector : 𝑒 FG.I ] 𝑢 = ∞
Phase Transition in Homotopy Method • Lemma*: there is a threshold 𝜄 , 𝑤 𝑤 𝑢 > 𝜄 𝑢 = 𝜄 𝑢 < 𝜄 • If using infinite steps, i.e., continuously ∞ → 0 • 𝑢 > 𝜄 , ||𝑦 O − 𝑦 ∗ || ≤ 𝑝(1)||𝑦 ∗ || • 𝑢 < 𝜄 , 𝑦 O , 𝑤 = Ω(1)
Phase Transition 𝑔 𝑦 = −𝑦 < + 0.8𝑦 > - 1.0 - 0.5 0.5 1.0 - 1.0 - 0.5 0.5 1.0 - 0.2 - 0.2 - 0.4 - 0.6 - 0.4 - 0.8 - 0.6 - 1.0 - 1.2 (𝑦, 0.3) (𝑦, 0.4) 0.1 0.15 0.10 - 1.0 - 0.5 0.5 1.0 0.05 - 0.1 - 1.0 - 0.5 0.5 1.0 - 0.2 - 0.05 - 0.10 - 0.3 - 0.15 - 0.4 - 0.20 (𝑦, 0.2) (𝑦, 0)
Phase Transition • If using infinite steps, i.e., continuously ∞ → 0 • 𝑢 > 𝜄 , ||𝑦 O − 𝑦 ∗ || ≤ 𝑝(1)||𝑦 ∗ || 𝑢 Z = ∞ • 𝑢 < 𝜄 , 𝑦 O , 𝑤 = Ω(1) 𝑢 > = 0 Infinite smoothing Power Method at 0 smoothing
Conclusions • Homotopy method gives near-optimal results for tensor PCA. • Possible to analyze non-convex functions even when they really have bad local optima.
Open Problems • More examples of Homotopy method? • When the tensor has higher rank? • General results for effects of smoothing • What kind of local optima will disappear? • Different way of smoothing/regularization?
Recommend
More recommend