A Stochastic PCA Algorithm with an Exponential Convergence Rate - PowerPoint PPT Presentation

A Stochastic PCA Algorithm with an Exponential Convergence Rate Ohad Shamir Weizmann Institute of Science NIPS Optimization Workshop December 2014 Ohad Shamir Stochastic PCA with Exponential Convergence 1/19

Principal Component Analysis PCA Input: x 1 , . . . , x n ∈ R d Goal: Find k directions with most variance n 1 � � 2 � � U ⊤ x � � max � n U ∈ R d × k : U ⊤ U = I i =1 For k = 1: Find leading eigenvector of covariance matrix � � n 1 � w ∈ R d : � w � =1 w ⊤ x i x ⊤ max w i n i =1 Ohad Shamir Stochastic PCA with Exponential Convergence 2/19

Existing Approaches � � n 1 � w ∈ R d : � w � =1 w ⊤ x i x ⊤ max w i n i =1 Regime: n , d “large”, non-sparse matrix Ohad Shamir Stochastic PCA with Exponential Convergence 3/19

Existing Approaches � � n 1 � w ∈ R d : � w � =1 w ⊤ x i x ⊤ max w i n i =1 Regime: n , d “large”, non-sparse matrix Approach 1: Eigendecomposition � n Compute leading eigenvector of 1 i =1 x i x ⊤ i exactly n (e.g. via QR decomposition) Runtime: O ( d 3 ) Ohad Shamir Stochastic PCA with Exponential Convergence 3/19

Existing Approaches Approach 2: Power Iterations Initialize w 1 randomly on unit sphere For t = 1 , 2 , . . . � 1 � � n � n 1 w ′ i =1 x i x ⊤ t +1 := w t = i =1 � w t , x i � x i n i n � � w t +1 := w ′ � w ′ t +1 / � t +1 � 1 � 1 �� O λ log iterations for ǫ -optimality ǫ λ : Eigengap O ( nd ) runtime per iteration � nd � d �� Overall runtime O λ log ǫ Ohad Shamir Stochastic PCA with Exponential Convergence 4/19

Existing Approaches Approach 2: Power Iterations Initialize w 1 randomly on unit sphere For t = 1 , 2 , . . . � 1 � � n � n 1 w ′ i =1 x i x ⊤ t +1 := w t = i =1 � w t , x i � x i n i n � � w t +1 := w ′ � w ′ t +1 / � t +1 � 1 � 1 �� O λ log iterations for ǫ -optimality ǫ λ : Eigengap O ( nd ) runtime per iteration � nd � d �� Overall runtime O λ log ǫ Approach 2.5: Lanczos Iterations More complex algorithm, but roughly similar iteration runtime � �� 1 1 and only O λ log iterations [Kuczy´ √ nski and ǫ Woz´ niakowski 1989] � �� d nd Overall runtime O λ log √ ǫ Ohad Shamir Stochastic PCA with Exponential Convergence 4/19

Existing Approaches Approach 3: Stochastic/Incremental Algorithms Example (Oja’s algorithm) Initialize w 1 randomly on unit sphere For t = 1 , 2 , . . . Pick i t ∈ { 1 , . . . , n } (randomly or otherwise) w ′ t +1 := w t + η t x i t x ⊤ i t w t � � w t +1 := w ′ � w ′ t +1 / � t +1 Also Krasulina 1969; Arora, Cotter, Livescu, Srebro 2012; Mitliagkas, Caramanis, Jain 2013; De Sa, Olukotun, R´ e 2014... Ohad Shamir Stochastic PCA with Exponential Convergence 5/19

Existing Approaches Approach 3: Stochastic/Incremental Algorithms Example (Oja’s algorithm) Initialize w 1 randomly on unit sphere For t = 1 , 2 , . . . Pick i t ∈ { 1 , . . . , n } (randomly or otherwise) w ′ t +1 := w t + η t x i t x ⊤ i t w t � � w t +1 := w ′ � w ′ t +1 / � t +1 Also Krasulina 1969; Arora, Cotter, Livescu, Srebro 2012; Mitliagkas, Caramanis, Jain 2013; De Sa, Olukotun, R´ e 2014... O ( d ) runtime per iteration Iteration bounds: � d � 1 �� Balsubramani, Dasgupta, Freund 2013: ˜ O ǫ + d λ 2 De Sa, Olukotun, R´ e 2014: For a different SGD method, � d � ˜ O λ 2 ǫ � � d 2 Runtime: ˜ O λ 2 ǫ Ohad Shamir Stochastic PCA with Exponential Convergence 5/19

Existing Approaches Up to constants/log-factors: Algorithm Time per iter. # iter. Runtime d 3 Exact 1 nd Power/Lanczos nd λ p λ p d 2 d Incremental d λ 2 ǫ λ 2 ǫ Main Question Can we get the best of both worlds? O ( d ) time per iteration and fast convergence (logarithmic dependence on ǫ ?) Ohad Shamir Stochastic PCA with Exponential Convergence 6/19

Convex Optimization to the Rescue? Our problem is equivalent to: n � − � w , x i � 2 � 1 � min n w : � w � =1 i =1 Much recent progress in solving strongly convex + smooth problems with finite-sum structure n 1 � min f i ( w ) n w ∈W i =1 Stochastic algorithms with O ( d ) runtime per iteration and exponential convergence [Le Roux, Schmidt, Bach 2012; Shalev-Shwartz and Zhang 2012; Johnson and Zhang 2013; Zhang, Mahdavi, Jin 2013; Koneˇ cn´ y and Richt´ arik 2013; Xiao and Zhang 2014; Zhang and Xiao, 2014...] Ohad Shamir Stochastic PCA with Exponential Convergence 7/19

Convex Optimization to the Rescue? n � − � w , x i � 2 � 1 � min n w : � w � =1 i =1 Unfortunately: Function not strongly convex, or even convex (in fact, concave everywhere) Has > 1 global optima, plateaus... ⇒ Existing results don’t work as-is But: Maybe we can borrow some ideas... Ohad Shamir Stochastic PCA with Exponential Convergence 8/19

Algorithm n 1 � − � w , x i � 2 � � min n w : � w � =1 i =1 Oja Iteration Choose i t ∈ { 1 , . . . , n } at random w ′ t +1 = w t + η t � w t , x i t � x i t � � w t +1 := w ′ � w ′ t +1 / � t +1 Essentially projected stochastic gradient descent Ohad Shamir Stochastic PCA with Exponential Convergence 9/19

Algorithm � n Letting A = 1 i =1 x i x ⊤ i , update step is n w ′ t +1 = w t + η t x i t x ⊤ i t w t � � x i t x ⊤ = w t + η t A w t + η t i t − A w t � �� power/gradient step zero-mean noise Ohad Shamir Stochastic PCA with Exponential Convergence 10/19

Algorithm � n Letting A = 1 i =1 x i x ⊤ i , update step is n w ′ t +1 = w t + η t x i t x ⊤ i t w t � � x i t x ⊤ = w t + η t A w t + η t i t − A w t � �� power/gradient step zero-mean noise Main idea: Replace by � � w ′ x i t x ⊤ t +1 = w t + η A w t + η i t − A ( w t − ˜ u ) � �� power/gradient step zero-mean noise where ˜ u “close” to w t (similar to SVRG of Johnson and Zhang (2013)) Ohad Shamir Stochastic PCA with Exponential Convergence 10/19

Algorithm VR-PCA Parameters: Step size η , epoch length m Input: Data set { x i } n i =1 , Initial unit vector ˜ w 0 For s = 1 , 2 , . . . � n u = 1 i =1 x i x ⊤ ˜ i ˜ w s − 1 n w 0 = ˜ w s − 1 For t = 1 , 2 , . . . , m Pick i t ∈ { 1 , . . . , n } uniformly at random w ′ � x i t x ⊤ � t = w t − 1 + η i t ( w t − 1 − ˜ w s − 1 ) + ˜ u 1 t � w ′ w t = t � w ′ w s = w m ˜ Ohad Shamir Stochastic PCA with Exponential Convergence 11/19

Algorithm VR-PCA Parameters: Step size η , epoch length m Input: Data set { x i } n i =1 , Initial unit vector ˜ w 0 For s = 1 , 2 , . . . � n u = 1 i =1 x i x ⊤ ˜ i ˜ w s − 1 n w 0 = ˜ w s − 1 For t = 1 , 2 , . . . , m Pick i t ∈ { 1 , . . . , n } uniformly at random w ′ � x i t x ⊤ � t = w t − 1 + η i t ( w t − 1 − ˜ w s − 1 ) + ˜ u 1 t � w ′ w t = t � w ′ w s = w m ˜ To get k > 1 directions: Either repeat, or perform orthogonal-like iterations: Replace all vectors by k × d matrices Replace normalization step by orthogonalization step Ohad Shamir Stochastic PCA with Exponential Convergence 11/19

Analysis Theorem Suppose max i � x i � 2 ≤ r, and A has leading eigenvector v 1 . 1 Assuming � ˜ w 0 , v 1 � ≥ 2 , then for any δ, ǫ ∈ (0 , 1) , if √ � η ≤ c 1 δ 2 m η 2 r 2 + r m ≥ c 2 log(2 /δ ) m η 2 log(2 /δ ) ≤ c 3 , r 2 λ , , ηλ � � log(1 /ǫ ) (where c 1 , c 2 , c 3 are constants) and we run T = epochs, log(2 /δ ) � � w T , v 1 � 2 ≥ 1 − ǫ � ˜ then Pr ≥ 1 − 2 log(1 /ǫ ) δ Corollary Picking η, m appropriately, ǫ -convergence w.h.p. � 1 � � � �� n + 1 in O d log runtime λ 2 ǫ Exponential convergence with O ( d )-time iterations Proportional to # examples plus eigengap Proportional to single data pass if λ ≥ 1 / √ n Ohad Shamir Stochastic PCA with Exponential Convergence 12/19

Proof Idea Track decay of F ( w t ) = 1 − � w t , v 1 � 2 Key Lemma Assuming η = αλ and F ( w t ) ≤ 3 / 4, � � � � 1 − Θ( αλ 2 ) α 2 λ 2 F (˜ E [ F ( w t +1 ) | w t ] ≤ F ( w t ) + O w s − 1 ) . Ohad Shamir Stochastic PCA with Exponential Convergence 13/19

Proof Idea Assume η = αλ ( α ≪ 1) Ohad Shamir Stochastic PCA with Exponential Convergence 14/19

Proof Idea Assume η = αλ ( α ≪ 1) Using martingale arguments: W.h.p., never reach “flat” region � � 1 in m ≤ O iterations α 2 λ 2 Ohad Shamir Stochastic PCA with Exponential Convergence 14/19

Proof Idea Assume η = αλ ( α ≪ 1) Using martingale arguments: W.h.p., never reach “flat” region � � 1 in m ≤ O iterations α 2 λ 2 ⇒ For all t ≤ m � � � � 1 − Θ( αλ 2 ) α 2 λ 2 F (˜ E [ F ( w t +1 ) | w t ] ≤ F ( w t )+ O w s − 1 ) . Ohad Shamir Stochastic PCA with Exponential Convergence 14/19

A Stochastic PCA Algorithm with an Exponential Convergence Rate - PowerPoint PPT Presentation

A Stochastic PCA Algorithm with an Exponential Convergence Rate Ohad Shamir Weizmann Institute of Science NIPS Optimization Workshop December 2014 Ohad Shamir Stochastic PCA with Exponential Convergence 1/19 Principal Component Analysis

Exponential Families Leila Wehbe March 19, 2013 Leila Wehbe Exponential Families Exponential

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Exponential Growth Exponential Growth Introduction Exponential Growth vs. Linear Growth

Applications of exponential functions Applications of exponential functions abound throughout the

Exponential Family Distributions CMSC 691 UMBC Exponential Family Form Exponential Family Form

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

Section5.2 Exponential Functions and Graphs Graphing Definition The exponential function with

GSoC 2016: Exponential Integrators Chiara Segala Mentor: Prof. Marco Caliari GSoC 2016:

Solving exponential and logarithmic equations We explore some results involving exponential

Beyond the exponential family Eric Pedersen, Gavin Simpson, David Miller August 6th, 2016 Away

Exponential distribution STAT 587 (Engineering) Iowa State University September 17, 2020

Exponential & Normal Distribution Lec.22 July 29, 2020 Exponential Distribution: Fundamental

Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit

Lecture 10: Point Clouds, Eigenvectors, PCA COMPSCI/MATH 290-04 Chris Tralie, Duke University

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Principal Component Analysis: Why do we use fourier transformation to analyze flow? Ziming Liu

Introduction to Big Data and Machine Learning Dimensionality Reduction Continuous Latent

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Class based on Raquel

E9 205 Machine Learning for Signal Processing Dimensionality Reduction - I 21-08-2019 Instructor

How? Where? Who? When? Matthew 16:18 And I tell you, you are Peter, and on this rock I will

A Stochastic PCA Algorithm with an Exponential Convergence Rate - PowerPoint PPT Presentation

A Stochastic PCA Algorithm with an Exponential Convergence Rate Ohad Shamir Weizmann Institute of Science NIPS Optimization Workshop December 2014 Ohad Shamir Stochastic PCA with Exponential Convergence 1/19 Principal Component Analysis

Exponential Families Leila Wehbe March 19, 2013 Leila Wehbe Exponential Families Exponential

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Exponential Growth Exponential Growth Introduction Exponential Growth vs. Linear Growth

Applications of exponential functions Applications of exponential functions abound throughout the

Exponential Family Distributions CMSC 691 UMBC Exponential Family Form Exponential Family Form

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

Section5.2 Exponential Functions and Graphs Graphing Definition The exponential function with

GSoC 2016: Exponential Integrators Chiara Segala Mentor: Prof. Marco Caliari GSoC 2016:

Solving exponential and logarithmic equations We explore some results involving exponential

Beyond the exponential family Eric Pedersen, Gavin Simpson, David Miller August 6th, 2016 Away

Exponential distribution STAT 587 (Engineering) Iowa State University September 17, 2020

Exponential &amp; Normal Distribution Lec.22 July 29, 2020 Exponential Distribution: Fundamental

Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit

Lecture 10: Point Clouds, Eigenvectors, PCA COMPSCI/MATH 290-04 Chris Tralie, Duke University

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Principal Component Analysis: Why do we use fourier transformation to analyze flow? Ziming Liu

Introduction to Big Data and Machine Learning Dimensionality Reduction Continuous Latent

CSC 411: Lecture 14: Principal Components Analysis &amp; Autoencoders Class based on Raquel

E9 205 Machine Learning for Signal Processing Dimensionality Reduction - I 21-08-2019 Instructor

How? Where? Who? When? Matthew 16:18 And I tell you, you are Peter, and on this rock I will

Exponential & Normal Distribution Lec.22 July 29, 2020 Exponential Distribution: Fundamental

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Class based on Raquel