PCA for Distributed Data Sets
Raymond H. Chan Department of Mathematics The Chinese University of Hong Kong Joint work with Franklin Luk (RPI) and Z.-J. Bai (CUHK)
– p. 1/34
PCA for Distributed Data Sets Raymond H. Chan Department of - - PowerPoint PPT Presentation
PCA for Distributed Data Sets Raymond H. Chan Department of Mathematics The Chinese University of Hong Kong Joint work with Franklin Luk (RPI) and Z.-J. Bai (CUHK) p. 1/34 Grid Powerful processors with relatively slow links.
– p. 1/34
✫✪ ✬✩ ✫✪ ✬✩
✫✪ ✬✩ ❅ ❅ ❅ ❅ ❅ ✫✪ ✬✩
✬✩
– p. 2/34
State University of NY at Albany Rensselaer Polytechnic Institute Brookhaven National Laboratory State University of NY at Stony Brook in partnership with IBM and NYSERNet
– p. 3/34
– p. 4/34
Four sites, scalable grid architecture 10 to 20 Gb/sec connection 6TF Processors
– p. 5/34
neneT n)X,
n = (1, 1, . . . , 1).
– p. 6/34
get spectral decomposition of n · S:
choose few largest eigenvalues and
form principal component vectors X ·
low-dimension representation of original data.
– p. 7/34
neneT n) is symmetric and idempotent,
neneT n)X
neneT n)(I − 1 neneT n)X
neneT n)X = UΣVT.
– p. 8/34
– p. 9/34
s−1
i=0
– p. 10/34
Compute PCA of X without moving the data
Move O(pα) data across processors instead of
– p. 11/34
nienieT ni)Xi = UiΣiVT i .
– p. 12/34
s−1
i=0
i
i + s−1
i=0
– p. 13/34
– p. 14/34
i=0
Local SVD’s introduce approximation errors. Central processor becomes bottleneck for
– p. 15/34
– p. 16/34
nienieT ni)Xi:
nienieT ni)Xi = Q(0) i
i ,
i
i
i .
– p. 17/34
i
i+s/2
i
i ,
i
nienieT ni)Xi
1 ni+s/2eni+s/2eT ni+s/2)Xi+s/
i
i .
– p. 18/34
i
i+s/4
i
i ,
i
i
i .
– p. 19/34
1
0 R(l) 0 ,
0 to central processor.
0 .
– p. 20/34
Total communication costs = O(⌈log2 s⌉p2). The covariance matrix
neneT n)X,
T
0 + s−1
i=0
– p. 21/34
– p. 22/34
– p. 23/34
– p. 24/34
Communication costs on PCA:
i=0
No local PCA approximation errors. Less congestion in central processor for
Work directly with data matrices.
– p. 25/34
i
i
– p. 26/34
1
s−1
1
s−1
1
s−1
– p. 27/34
1
s−1
s−1
i=0
i ,
i
i
– p. 28/34
m
i=0
1 g(m)eg(m)eT g(m))X(m).
– p. 29/34
1 n(k)en(k)eT n(k))X(k).
m
k=0
m
k=1 g(k−1)n(k) g(k)
– p. 30/34
k Rk. Then
m
k=0
k Rk
m
k=1 g(k−1)n(k) g(k)
– p. 31/34
Global PCA can be computed without moving
Communication costs still O(p2⌈log2 s⌉), No local PCA approximation errors. Work directly with data matrices and update
Load balancing for communications and
– p. 32/34
PCA of X(k) ← PCA of X(k−1)
PCA of X(k) obtained in tk+ℓ. The procedure is periodic with period ℓ. Well-balanced among the processors.
– p. 33/34
Processor 1 2 3 4 5 6 7 Time
XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR XR RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR
t2 t0 t1 t6 t3 t4 t5 Communication Computation
RR RR RR RR RR RR RR RR XR RR RR RR RR RR RR
– p. 34/34