PCA for Distributed Data Sets Raymond H. Chan Department of - PowerPoint PPT Presentation

PCA for Distributed Data Sets Raymond H. Chan Department of Mathematics The Chinese University of Hong Kong Joint work with Franklin Luk (RPI) and Z.-J. Bai (CUHK) – p. 1/34

Grid Powerful processors with relatively slow links. ✬✩ ✬✩ Powerful 0 1 Teraflop ✫✪ ✫✪ � ❅ Processors � ❅ ✬✩ � ❅ Slow � ❅ � ❅ Gigabit 4 ✫✪ Networks � � ✬✩ � ✬✩ � � 3 2 ✫✪ ✫✪ – p. 2/34

New York State TeraGrid Initiative � State University of NY at Albany � Rensselaer Polytechnic Institute � Brookhaven National Laboratory � State University of NY at Stony Brook � in partnership with IBM and NYSERNet – p. 3/34

– p. 4/34

Grid Topology � Four sites, scalable grid architecture � 10 to 20 Gb/sec connection � 6TF Processors Communication = 6 × 10 12 Computation 0.3 × 10 9 = 20, 000! Communication Bottleneck: computation done locally. – p. 5/34

Principal Component Analysis Let X be an n × p data matrix, where n ≫ p . Data covariace matrix S is given by nS = X T ( I − 1 n e n e T n ) X , where e T n = ( 1, 1, . . . , 1 ) . PCA ⇐ ⇒ Karhunen-Loève transform ⇐ ⇒ Hotelling transform – p. 6/34

PCA means � get spectral decomposition of n · S : nS = V Σ 2 V T . � choose few largest eigenvalues and eigenvectors � V . � form principal component vectors X · � V . � low-dimension representation of original data. – p. 7/34

PCA involves only V and Σ Since ( I − 1 n e n e T n ) is symmetric and idempotent, nS = X T ( I − 1 n e n e T n ) X = X T ( I − 1 n e n e T n e n e T n )( I − 1 n ) X Σ and V can be obtained from SVD of: n e n e T n ) X = U Σ V T . ( I − 1 Low-dim’l representation X · � V can still be done. – p. 8/34

Distributed PCA Big data matrix X : n ≈ 10 12 . E.g. visualization, data mining. Problem: Data are distributed amongst s processors. Can we find Σ and V without moving X across processors? – p. 9/34

Data among s processors Denote     X 0      s − 1 X 1   ∑ X = n = n i ,   . .    .  i = 0   X s − 1 where X i is n i × p , resides on processor i . Typical: n i ≈ 10 12 and p ≈ 10 3 . – p. 10/34

Aim � Compute PCA of X without moving the data matrix X i . � Move O ( p α ) data across processors instead of O ( n i ) . – p. 11/34

Distributed PCA by Qu et al. 1. At processor i , calculate local PCA using SVD: n i e n i e T n i ) X i = U i Σ i V T ( I − 1 i . Say matrix has numerical rank k i . Send ¯ x i (column sum of X i ), and k i largest principal components � Σ i and � V i to central processor. Communication costs = O ( pk i ) . – p. 12/34

2. At central processor: Assemble p × p covariance matrix and find its PCA s − 1 s − 1 n � V i � � i � Σ 2 V T x ) T ∑ ∑ S = i + n i ( ¯ x i − ¯ x )( ¯ x i − ¯ i = 0 i = 0 = V Σ 2 V T . Broadcast � V , the first k columns of V . Communication costs = O ( pk ) . – p. 13/34

3. Calculate principal component vectors at processor i : X i � V . – p. 14/34

Analysis of Qu’s Approach Advantage: Reduce communication costs: � s − 1 � �� ∑ O ( pn ) − → O p k i . i = 0 Disadvantages: � Local SVD’s introduce approximation errors. � Central processor becomes bottleneck for communications and computation. – p. 15/34

Luk’s Algorithms Replace SVD by QR – p. 16/34

1a. At processor i Calculate QR decomposition of ( I − 1 n i e n i e T n i ) X i : n i ) X i = Q ( 0 ) R ( 0 ) n i e n i e T ( I − 1 i , i where R ( 0 ) is p × p . i Send n i and ¯ x i to central processor. If i ≥ s /2, send R ( 0 ) to processor i − s /2. i No need to send Q ( 0 ) i . – p. 17/34

1b. At processor i < s /2 Calculate QR decomposition � � R ( 0 ) = Q ( 1 ) R ( 1 ) i i , R ( 0 ) i i + s /2 where R ( 1 ) is p × p . Equals to QRD of i ( I − 1 1 n i e n i e T n i + s /2 e n i + s /2 e T n i ) X i ( I − n i + s /2 ) X i + s / and If i ≥ s /4, send R ( 1 ) to processor i − s /4. i No need to send Q ( 1 ) i . – p. 18/34

1c. At processor i < s /4 Calculate QR decomposition � � R ( 1 ) = Q ( 2 ) R ( 2 ) i i , R ( 1 ) i i + s /4 where R ( 2 ) is p × p . i If i ≥ s /8, send R ( 2 ) to processor i − s /8. i No need to send Q ( 2 ) i . – p. 19/34

1d. Eventually, at processor 0 Calculate QR decomposition of � � R ( l − 1 ) = Q ( l ) 0 R ( l ) 0 0 , R ( l − 1 ) 1 where l = ⌈ log 2 s ⌉ . Send R ( l ) 0 to central processor. No need to send Q ( l ) 0 . – p. 20/34

Main Results � Total communication costs = O ( ⌈ log 2 s ⌉ p 2 ) . � The covariance matrix nS = X T ( I − 1 n e n e T n ) X , is given by: s − 1 T nS = R ( ℓ ) R ( ℓ ) x ) T . ∑ 0 + n i ( ¯ x i − ¯ x )( ¯ x i − ¯ 0 i = 0 – p. 21/34

2. At central processor Assemble ( s + p ) × p data matrix:   √ n 0 ( ¯ x ) T x 0 − ¯ √ n 1 ( ¯   x ) T x 1 − ¯    .  . Z = . .   √ n s − 1 ( ¯   x ) T x s − 1 − ¯   R ( l ) 0 Notice that: nS = Z T Z . – p. 22/34

2. At central processor Compute SVD: Z = U Σ V T (after triangulation). Say Z has numerical rank k . x and � V , first k columns of V . Broadcast ¯ Communication costs = O ( pk ) . – p. 23/34

3. At processor i Calculate principal component vectors: X i � V . – p. 24/34

Analysis of Luk’s Algorithm Advantage over Qu’s Approach: � Communication costs on PCA: �� − � s − 1 � → O ( p 2 ⌈ log 2 s ⌉ ) , ∑ O p k i i = 0 � No local PCA approximation errors. � Less congestion in central processor for communications and computation. � Work directly with data matrices. – p. 25/34

Data Updating Assume global synchronization at t 0 , t 1 , . . . , t k , i.e. at [ t k − 1 , t k ] , new data are added to X ( k ) on i processor i . Aim: Find the PCA for the new extended matrix, without moving X ( k ) across processors. i – p. 26/34

X ( m ) X ( 0 ) X ( 1 ) X ( m ) · · · Processor X ( 0 ) X ( 1 ) X ( m ) s − 1 s − 1 s − 1 s − 1 . . . · · · X ( 0 ) X ( 1 ) X ( m ) 1 1 1 1 X ( 0 ) X ( 1 ) X ( m ) 0 0 0 0 t · · · t 0 t 1 t m – p. 27/34

At time t k Let    X ( k )    0     X ( k )   s − 1 X ( k ) = n ( k ) = n ( k )   ∑ 1 i , .   .  .    i = 0    X ( k ) s − 1 where X ( k ) is n ( k ) × p . i i Assume PCA of original matrix X ( 0 ) = X is available by Luk’s algorithm. – p. 28/34

Global Data Matrix at t m Denote    X ( 0 )       X ( 1 ) m   X ( m ) = n ( k ) . ∑ g ( m ) =   . .    .  i = 0   X ( s − 1 ) Aim: Find PCA for its covariance matrix: g ( m ) · S g ( m ) = X ( m ) T ( I − g ( m ) ) X ( m ) . g ( m ) e g ( m ) e T 1 – p. 29/34

Our Theorem Let n ( k ) · S k = X ( k ) T ( I − n ( k ) ) X ( k ) . n ( k ) e n ( k ) e T 1 Then m n ( k ) S k ∑ g ( m ) S g ( m ) = k = 0 m g ( k − 1 ) n ( k ) ∑ x n ( k ) ) T + ( ¯ x g ( k − 1 ) − ¯ x n ( k ) )( ¯ x g ( k − 1 ) − ¯ g ( k ) k = 1 – p. 30/34

Explanation PCA of update data matrix X ( k ) can be obtained by Luk’s algorithm, i.e. n ( k ) S k = R T k R k . Then m R T ∑ g ( m ) S g ( m ) = k R k k = 0 m g ( k − 1 ) n ( k ) ∑ x n ( k ) ) T + ( ¯ x g ( k − 1 ) − ¯ x n ( k ) )( ¯ x g ( k − 1 ) − ¯ g ( k ) k = 1 Assemble them to construct global PCA for X ( m ) . – p. 31/34

Analysis of Our Algorithm � Global PCA can be computed without moving X ( k ) . � Communication costs still O ( p 2 ⌈ log 2 s ⌉ ) , � No local PCA approximation errors. � Work directly with data matrices and update matrices. � Load balancing for communications and computation. – p. 32/34

Load Balancing Let s = 2 ℓ . We can allocate all processors to do the QR factorizations such that: � PCA of X ( k ) ← PCA of X ( k − 1 ) + R factor of X ( k ) . � PCA of X ( k ) obtained in t k + ℓ . � The procedure is periodic with period ℓ . � Well-balanced among the processors. – p. 33/34

PCA for Distributed Data Sets Raymond H. Chan Department of - PowerPoint PPT Presentation

PCA for Distributed Data Sets Raymond H. Chan Department of Mathematics The Chinese University of Hong Kong Joint work with Franklin Luk (RPI) and Z.-J. Bai (CUHK) p. 1/34 Grid Powerful processors with relatively slow links.

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

MATH 105: Finite Mathematics 6-1: Sets Prof. Jonathan Duncan Walla Walla College Winter

PCA and Admixture proportions for low depth NGS data Anders Albrechtsen Admixture model

PCA and admixture proportions for NGS data Anders Albrechtsen Admixture model NGSadmix

Big Data Management & Analytics EXERCISE 8 TEXT PROCESSING, PCA 21st of December, 2015

Sets Sets A Set is an abstract data type representing an unordered Sets are unordered and

Lecture 24: Principal Component Analysis Aykut Erdem January 2017 Hacettepe University This

Multidimensional Scaling Max Turgeon STAT 4690Applied Multivariate Analysis Recap: PCA

Principal Component Analysis (PCA) Dr. Veselina Kalinova Max Planck Institute for

PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan

Kernel PCA for SNe Kernel PCA for SNe photometric classification photometric classification

Application of PCA to Facial Recognition Aaron Kosmatin, Clayton Broman Math 45 December 17,

Fast algorithms for sparse principal component analysis based on Rayleigh quotient iteration

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon Research Cambridge and University

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS,

Using HUD's CNA e-Tool for RAD Transactions Office of Recapitalization December 7, 2017 Webinar

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Pediatric Ruptured PCA Aneurysm Ricardo A Hanel, MD PhD Director, Stroke and Cerebrovascular

Python Programming for Data Processing and Climate Analysis Jules Kouatchou and Hamid Oloso

PCA on manifolds: application to spaces of landmarks Dr Sergey Kushnarev Singapore University of

PCA for Distributed Data Sets Raymond H. Chan Department of - PowerPoint PPT Presentation

PCA for Distributed Data Sets Raymond H. Chan Department of Mathematics The Chinese University of Hong Kong Joint work with Franklin Luk (RPI) and Z.-J. Bai (CUHK) p. 1/34 Grid Powerful processors with relatively slow links.

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

MATH 105: Finite Mathematics 6-1: Sets Prof. Jonathan Duncan Walla Walla College Winter

PCA and Admixture proportions for low depth NGS data Anders Albrechtsen Admixture model

PCA and admixture proportions for NGS data Anders Albrechtsen Admixture model NGSadmix

Big Data Management &amp; Analytics EXERCISE 8 TEXT PROCESSING, PCA 21st of December, 2015

Sets Sets A Set is an abstract data type representing an unordered Sets are unordered and

Lecture 24: Principal Component Analysis Aykut Erdem January 2017 Hacettepe University This

Multidimensional Scaling Max Turgeon STAT 4690Applied Multivariate Analysis Recap: PCA

Principal Component Analysis (PCA) Dr. Veselina Kalinova Max Planck Institute for

PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan

Kernel PCA for SNe Kernel PCA for SNe photometric classification photometric classification

Application of PCA to Facial Recognition Aaron Kosmatin, Clayton Broman Math 45 December 17,

Fast algorithms for sparse principal component analysis based on Rayleigh quotient iteration

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon Research Cambridge and University

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS,

Using HUD's CNA e-Tool for RAD Transactions Office of Recapitalization December 7, 2017 Webinar

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Pediatric Ruptured PCA Aneurysm Ricardo A Hanel, MD PhD Director, Stroke and Cerebrovascular

Python Programming for Data Processing and Climate Analysis Jules Kouatchou and Hamid Oloso

PCA on manifolds: application to spaces of landmarks Dr Sergey Kushnarev Singapore University of

Big Data Management & Analytics EXERCISE 8 TEXT PROCESSING, PCA 21st of December, 2015