a sub linear method for computing columns of functions of
play

A sub-linear method for computing columns of functions of sparse - PowerPoint PPT Presentation

A sub-linear method for computing columns of functions of sparse matrices Kyle Kloster and David F. Gleich Purdue University March 3, 2014 Supported by NSF CAREER 1149756-CCF Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 1 / 29


  1. A sub-linear method for computing columns of functions of sparse matrices Kyle Kloster and David F. Gleich Purdue University March 3, 2014 Supported by NSF CAREER 1149756-CCF Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 1 / 29

  2. Overview 1. f ( A ): problem description and applications 2. Description of “sub-linear” results 3. The Algorithm for f ( A ) b 4. Intuition for proof 5. Experiments on real-world social networks Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 2 / 29

  3. The Problem Functions of Matrices: background We can apply most functions, e.g. f ( x ) = cos ( x ), to any square matrices A if f is defined on the eigenvalues of A . One definition: Taylor series! 0! + − x 2 + x 4 cos ( x ) = 1 4! + · · · 2! 0! + − A 2 + A 4 cos ( A ) = I 4! + · · · 2! Then we can think of f ( A ) b as the action of the operator f ( A ) on b , or as a diffusion on a graph underlying the matrix A . Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 3 / 29

  4. The Problem Functions of Matrices: applications Action : f ( x ) = e x : d x dt = Ax ; x (0) = x 0 solution: x ( t ) = exp { t A } x 0 f ( x ) = x 1 / p : P ( t ) transition matrix for Markov process P (1) describes process over a year; P 1 / 12 for a month Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 4 / 29

  5. The Problem Functions of Matrices: applications Action : f ( x ) = e x : d x dt = Ax ; x (0) = x 0 solution: x ( t ) = exp { t A } x 0 f ( x ) = x 1 / p : P ( t ) transition matrix for Markov process P (1) describes process over a year; P 1 / 12 for a month Diffusion : f ( x ) = (1 − α x ) − 1 : the resolvent yields the PageRank diffusion: f ( P ) e i interpreted as nodes’ importance to node i . f ( x ) = e tx : e t P e i , the heat kernel diffusion, offers an alternative ranking of nodes’ importance Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 4 / 29

  6. The Problem Parameters of f ( A ) b A : Original motivation: A = a normalized version of an adjacency matrix from a social network; the Laplacian or random-walk matrix. Sparse, small diameter, stochastic, degree distribution follows power-law Generalized: any nonnegative A with � A � 1 ≤ 1. b : Originally b = e i , i.e. compute a column f ( A ) e i Generalized: b can be any sparse, stochastic vector f ( · ): Originally f ( x ) = e x , (1 − α x ) − 1 Generalized: can be any function decaying “fast enough” Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 5 / 29

  7. The Problem Columns of the Matrix Exponential exp { A } used for link-prediction, node centrality, and clustering. Why? ∞ 1 � k ! A k exp { A } = k =0 ( A k ) ij gives the number of length- k walks from i to j , so... Large entries of exp { A } denote “important” nodes / links Used for link-prediction, node ranking, clustering Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 6 / 29

  8. The Problem Columns of the Matrix Exponential exp { A } used for link-prediction, node centrality, and clustering. Why? ∞ 1 � k ! A k exp { A } = k =0 ( A k ) ij gives the number of length- k walks from i to j , so... Large entries of exp { A } denote “important” nodes / links Used for link-prediction, node ranking, clustering exp { A } is common, but other f ( A ) can be used: PageRank can be defined from the resolvent: ∞ ( I − α A ) − 1 = � α k A k k =0 1 → replace k ! with other coefficients? Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 6 / 29

  9. The Problem f ( A ) as weighted sum of walks For f ( A ) = e t A and f ( A ) = (1 − α A ) − 1 , how are walks weighted? f 0 I + f 1 A + f 2 A 2 + f 3 A 3 + · · · � � f ( A ) b = b 0 10 α =0.99 Weight −5 10 t=1 t=5 t=15 α =0.85 0 20 40 60 80 100 Length Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 7 / 29

  10. The Problem Big Graphs from Social Networks We’ve seen the computation ( f ); what does the domain of inputs look like? Social networks like Twitter, YouTube, Friendster, Livejournal Large: n = 10 6 , 10 7 , 10 9 + Sparse: | E | = O ( n ), often ≤ 50 n Difficulty: “small world” property: diameter ≈ 4 (!) Helpful: Power-law degree distribution (picture) Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 8 / 29

  11. The Problem Power-law degree distribution 1e+07 1e+06 100000 frequency 10000 1000 100 10 1 0 9 99 999 9999 outdegree [Laboratory for Web Algorithms, http://law.di.unimi.it/index.php] Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 9 / 29

  12. The Problem Difficulties with current methods: Sidje, TOMS 1998; Al-Mohy and Higham, SISC 2011 Leading methods for f ( A ) b use Krylov or Taylor methods: “basically” repeated mat-vecs “Small world” property: graph diameter ≤ 4 ⇒ repeated mat-vecs fill in rapidly (see picture) Not designed specifically for sparse networks. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 10 / 29

  13. The Problem Fill-in from repeated matvecs Vectors P k e i for k = 1 , 2 , 3 , 4. n = 1133 Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 11 / 29

  14. The Problem f ( P ) e i is a localized vector 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 200 400 600 800 1000 1200 x-axis: vector index, y-axis: magnitude of entry the column of exp { P } produced by previous slide’s matvecs Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 12 / 29

  15. The Problem Local Method New method: avoid mat-vecs! → use a local method. Local algorithms run in time proportional to size of output: sparse solution vector = small runtime Instead of matvecs, we do specially-selected vector adds using a relaxation method. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 13 / 29

  16. Main results Main Result 1 Theorem 1:[action of f on b ] Given nonnegative A satisfying � A � 1 ≤ 1, with power-law degree distribution and max degree d ; and sparse stochastic b ; our method computes x ≈ f ( A ) b such that � (1 /ε ) C f log(1 /ε ) d 2 log( d ) 2 � � f ( A ) b − x � 1 < ε in work ( ε ) = O , d 2 log( d ) 2 in the graph size “work” “scales as” for any function f that decays “fast enough”. The constant C f depends on how quickly the Taylor coefficients of f decay. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 14 / 29

  17. Main results Main Result 1 Theorem 1:[action of f on b ] Given nonnegative A satisfying � A � 1 ≤ 1, with power-law degree distribution and max degree d ; and sparse stochastic b ; our method computes x ≈ f ( A ) b such that � (1 /ε ) C f log(1 /ε ) d 2 log( d ) 2 � � f ( A ) b − x � 1 < ε in work ( ε ) = O , d 2 log( d ) 2 in the graph size “work” “scales as” for any function f that decays “fast enough”. The constant C f depends on how quickly the Taylor coefficients of f decay. For f ( x ) = (1 − α x ) − 1 , 1 C f = (Note: α ∈ (0 , 1)). 1 − α C f = 3 For f ( x ) = e x , 2 3 p For f ( x ) = x 1 / p , C f = (Note: p ∈ (0 , 1)). 5 p − 1 Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 14 / 29

  18. Main results Main Result 2 Theorem 2:[diffusion of f across a graph] x ≈ ˜ Given column stochastic A and b , ˜ f ( t A ) b can be computed such that � 2 f ( t ) � � ˜ f ( P ) b − ˜ x � ∞ < ε in work ( ε ) = O , ε (Remark: the ‘tilde’ denotes a degree-normalized version for the diffusion: D − 1 exp { t P } b , for example. We normalize by degrees to adjust for the influence of the stationary distribution of P .) Corollary: f ( A ) b is a local vector. Proof: Because sublinear work is done, f ( A ) b cannot have O ( n ) nonzeros. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 15 / 29

  19. Our method: Nexpokit Overview Outline of Nexpokit method (our second method, hk-relax, is related) 1. Express f ( A ) b via a Taylor polynomial 2. Form large linear system out of Taylor terms 3. Use sparse solver to approximate each term’s largest entries 4. Combine approximated terms into a solution Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 16 / 29

  20. Our method: Nexpokit In terms of Taylor terms Taylor polynomial: � f 0 I + f 1 A + f 2 A 2 + f 3 A 3 + · · · + f N A N � f ( A ) b ≈ b Compute terms recursively: v k = f k A k e i = f k f k − 1 A k − 1 � � f k − 1 A e i f k v k = f k − 1 Av k − 1 Then f ( A ) b ≈ v 0 + v 1 + · · · + v N − 1 + v N (But we want to avoid computing v j in full...) Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 17 / 29

  21. Our method: Nexpokit Forming a linear system So we convert the Taylor polynomial into a linear system. For simplicity’s sake, we use the example of exp { A } e i here. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 18 / 29

  22. Our method: Nexpokit Forming a linear system So we convert the Taylor polynomial into a linear system. For simplicity’s sake, we use the example of exp { A } e i here.   I     v 0 e i − A / 1 I v 1 0         ...     0   v 2 − A / 2 =        .   .   ...  . .     . .   I       v N 0 − A / N I where we use the identity v k = 1 k Av k − 1 (which comes from k ! , so f k / f k − 1 = ( k − 1)! f k − 1 Av k − 1 , since f k = 1 f k = 1 v k = k ). k ! Then exp { A } e i ≈ v 0 + v 1 + · · · + v N − 1 + v N Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 18 / 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend