Principal Component Analysis for Distributed Data David Woodruff - PowerPoint PPT Presentation

Principal Component Analysis for Distributed Data David Woodruff IBM Almaden Based on works with Ken Clarkson, Ravi Kannan, and Santosh Vempala

Outline 1. What is low rank approximation? 2. How do we solve it offline? 3. How do we solve it in a distributed setting? 2

Low rank approximation § A is an n x d matrix § Think of n points in R d § E.g., A is a customer-product matrix § A i,j = how many times customer i purchased item j § A is typically well-approximated by low rank matrix § E.g., high rank because of noise § Goal: find a low rank matrix approximating A § Easy to store, data more interpretable 3

What is a good low rank approximation? Singular Value Decomposition (SVD) A k = argmin rank k matrices B |A-B| F Any matrix A = U ¢ Σ ¢ V § U has orthonormal columns § Σ is diagonal with non-increasing positive (|C| F = ( Σ i,j C i,j2 ) 1/2 ) The rows of V k are entries down the diagonal the top k principal § V has orthonormal rows components Computing A k exactly is expensive § Rank-k approximation: A k = U k ¢ Σ k ¢ V k 4

Low rank approximation § Goal: output a rank k matrix A ’ , so that |A-A ’ | F · (1+ ε ) |A-A k | F § Can do this in nnz(A) + (n+d)*poly(k/ ε ) time [S,CW] § nnz(A) is number of non-zero entries of A 5

Solution to low-rank approximation [S] § Given n x d input matrix A § Compute S*A using a sketching matrix S with k/ ε << n rows. S*A takes random linear combinations of rows of A A SA § Project rows of A onto SA, then find best rank-k approximation to points inside of SA. 6

What is the matrix S? § S can be a k/ ε x n matrix of i.i.d. normal random variables § [S] S can be a k/ ε x n Fast Johnson Lindenstrauss Matrix § Uses Fast Fourier Transform § [CW] S can be a poly(k/ ε ) x n CountSketch matrix [ [ S ¢ A can be 0 0 1 0 0 1 0 0 computed in 1 0 0 0 0 0 0 0 nnz(A) time! 0 0 0 -1 1 0 -1 0 0-1 0 0 0 0 0 1 7

Caveat: projecting the points onto SA is slow § Current algorithm: 1. Compute S*A 2. Project each of the rows onto S*A 3. Find best rank-k approximation of projected points inside of rowspace of S*A § Bottleneck is step 2 § [CW] Approximate the projection § Fast algorithm for approximate regression 2 min rank-k X |X(SA)-A| F § nnz(A) + (n+d)*poly(k/ ε ) time 8

Distributed low rank approximation § We have fast algorithms, but can they be made to work in a distributed setting? § Matrix A distributed among s servers § For t = 1, … , s, we get a customer-product matrix from the t-th shop stored in server t. Server t ’ s matrix = A t § Customer-product matrix A = A 1 + A 2 + … + A s § More general than row-partition model in which each customer shops in only one shop 9

Communication cost of low rank approximation § Input: n x d matrix A stored on s servers § Server t has n x d matrix A t § A = A 1 + A 2 + … + A s § Output: Server t has n x d matrix C t satisfying § C = C 1 + C 2 + … + C s has rank at most k § |A-C| F · (1+ ε )|A-A k | F § Application: distributed clustering § Resources: Each server is polynomial time, linear space, communication is O(1) rounds. Bound the total number of words communicated § [KVW]: O(skd/ ε ) communication, independent of n 10

Protocol § Designate one machine the Central Processor (CP) § Let S be one of the poly(k/ ε ) x n random matrices above Problems: § S can be generated pseudorandomly from small seed § CP chooses small seed for S and sends it to all servers § Can ’ t output A t UU T since rank too large § Could communicate A t U to CP, then CP § Server t computes SA t and sends it to CP computes SVD of Σ t A t U U T = AUU T § CP computes Σ i=1 s SA t = SA § But communicating A t U depends on n § CP sends orthonormal basis U T for row space of SA to each server 11 § Server t computes A t U

Approximate SVD lemma § Problem reduces to Communication § Server t has n x r matrix B t = A t U, where r = poly(k/ ε ) § B = Σ t B t independent of n! § CP outputs top k principal components of B § Approximate SVD § If W T 2 R k x r is the matrix of top k principal components of PB, where P is a random r/ ε 2 x n matrix, |B-BW W T | F · (1+ ε ) |B-B k | F § CP sends P to every server § Server t sends PB t to CP who computes PB = Σ t PB t § CP computes W, sends everyone W 12

The protocol § Phase 1: § Learn an orthonormal basis U for row space of SA optimal space in U U cost · (1+ ε )|A-A k | F 13

The protocol § Phase 2: § Find an approximately optimal space W inside of U optimal space in U approximate U space W in U cost · (1+ ε ) 2 |A-A k | F 14

Conclusion § O(sdk/ ε ) communication protocol for low rank approximation § A bit sloppy with words vs. bits but can be dealt with § Almost matching Ω (sdk) bit lower bound § Can be strengthened to Ω (sdk/ ε ) in one-way model § Can we remove the one-way restriction? § Communication cost of other optimization problems? § Linear programming § Frequency moments § Matching § etc. 15

Principal Component Analysis for Distributed Data David Woodruff - PowerPoint PPT Presentation

Principal Component Analysis for Distributed Data David Woodruff IBM Almaden Based on works with Ken Clarkson, Ravi Kannan, and Santosh Vempala Outline 1. What is low rank approximation? 2. How do we solve it offline? 3. How do we solve it in a

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Principal Component Analysis Powerpoint Presentation What is multivariate analysis? Summarizing

Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG

CS475/CS675 Lecture 23: July 19, 2016 Principal Component Analysis, Eigenfaces CS475/CS675 (c)

Functional components Notification component Application received Refuse ? Notification

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

Principal Component Analysis http://setosa.io/ev/principal- Food consumption in the UK

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

Introduction to Principal Component Analysis and Indepedent Component Analysis Tristan A. Hearn

Chapter 5 Singular value decomposition and principal component analysis In A Practical Approach to

Hebbian Learning, Hebbian Learning Principal Component Analysis, and Independent Component

Principal Component Analysis in a Linear Algebraic View by Anna Orosz under the mentorship of

Lecture 3 Principal Component Analysis Lin ZHANG, PhD School of Software Engineering Tongji

Component selection 1 (c) 2020 A.J.M. Montagne Component selection + - + - + - 2 (c)

Optimal Communication-Distortion Tradeoff in Voting Debmalya Mandal (Columbia), Nisarg Shah

Introduction to Progressive Hedging Applied to Mixed-Integer and Non-Linear Stochastic Programs

ORSA TECHNOLOGIES LLC Solution-Based Engineering and Analytics How 8(a) Companies Survive

1 2 Compress a massive object to a small sketch 2 Compress a massive object to a small

Quantum Chebyshevs Inequality and Applications Yassine Hamoudi, Frdric Magniez IRIF ,

Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 2011 joint work with Daniel

A Space Optimal Streaming Algorithm for Sketching Small Moments Daniel M. Kane Jelani Nelson

Relational Contracts and the Value of Loyalty Simon Board Department of Economics, UCLA November

Sambuz

Useful Links

Newsletter

Mail Us