A sub-linear method for computing columns of functions of sparse - PowerPoint PPT Presentation

A sub-linear method for computing columns of functions of sparse matrices Kyle Kloster and David F. Gleich Purdue University March 3, 2014 Supported by NSF CAREER 1149756-CCF Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 1 / 29

Overview 1. f ( A ): problem description and applications 2. Description of “sub-linear” results 3. The Algorithm for f ( A ) b 4. Intuition for proof 5. Experiments on real-world social networks Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 2 / 29

The Problem Functions of Matrices: background We can apply most functions, e.g. f ( x ) = cos ( x ), to any square matrices A if f is defined on the eigenvalues of A . One definition: Taylor series! 0! + − x 2 + x 4 cos ( x ) = 1 4! + · · · 2! 0! + − A 2 + A 4 cos ( A ) = I 4! + · · · 2! Then we can think of f ( A ) b as the action of the operator f ( A ) on b , or as a diffusion on a graph underlying the matrix A . Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 3 / 29

The Problem Functions of Matrices: applications Action : f ( x ) = e x : d x dt = Ax ; x (0) = x 0 solution: x ( t ) = exp { t A } x 0 f ( x ) = x 1 / p : P ( t ) transition matrix for Markov process P (1) describes process over a year; P 1 / 12 for a month Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 4 / 29

The Problem Functions of Matrices: applications Action : f ( x ) = e x : d x dt = Ax ; x (0) = x 0 solution: x ( t ) = exp { t A } x 0 f ( x ) = x 1 / p : P ( t ) transition matrix for Markov process P (1) describes process over a year; P 1 / 12 for a month Diffusion : f ( x ) = (1 − α x ) − 1 : the resolvent yields the PageRank diffusion: f ( P ) e i interpreted as nodes’ importance to node i . f ( x ) = e tx : e t P e i , the heat kernel diffusion, offers an alternative ranking of nodes’ importance Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 4 / 29

The Problem Parameters of f ( A ) b A : Original motivation: A = a normalized version of an adjacency matrix from a social network; the Laplacian or random-walk matrix. Sparse, small diameter, stochastic, degree distribution follows power-law Generalized: any nonnegative A with � A � 1 ≤ 1. b : Originally b = e i , i.e. compute a column f ( A ) e i Generalized: b can be any sparse, stochastic vector f ( · ): Originally f ( x ) = e x , (1 − α x ) − 1 Generalized: can be any function decaying “fast enough” Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 5 / 29

The Problem Columns of the Matrix Exponential exp { A } used for link-prediction, node centrality, and clustering. Why? ∞ 1 � k ! A k exp { A } = k =0 ( A k ) ij gives the number of length- k walks from i to j , so... Large entries of exp { A } denote “important” nodes / links Used for link-prediction, node ranking, clustering Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 6 / 29

The Problem Columns of the Matrix Exponential exp { A } used for link-prediction, node centrality, and clustering. Why? ∞ 1 � k ! A k exp { A } = k =0 ( A k ) ij gives the number of length- k walks from i to j , so... Large entries of exp { A } denote “important” nodes / links Used for link-prediction, node ranking, clustering exp { A } is common, but other f ( A ) can be used: PageRank can be defined from the resolvent: ∞ ( I − α A ) − 1 = � α k A k k =0 1 → replace k ! with other coefficients? Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 6 / 29

The Problem f ( A ) as weighted sum of walks For f ( A ) = e t A and f ( A ) = (1 − α A ) − 1 , how are walks weighted? f 0 I + f 1 A + f 2 A 2 + f 3 A 3 + · · · � � f ( A ) b = b 0 10 α =0.99 Weight −5 10 t=1 t=5 t=15 α =0.85 0 20 40 60 80 100 Length Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 7 / 29

The Problem Big Graphs from Social Networks We’ve seen the computation ( f ); what does the domain of inputs look like? Social networks like Twitter, YouTube, Friendster, Livejournal Large: n = 10 6 , 10 7 , 10 9 + Sparse: | E | = O ( n ), often ≤ 50 n Difficulty: “small world” property: diameter ≈ 4 (!) Helpful: Power-law degree distribution (picture) Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 8 / 29

The Problem Power-law degree distribution 1e+07 1e+06 100000 frequency 10000 1000 100 10 1 0 9 99 999 9999 outdegree [Laboratory for Web Algorithms, http://law.di.unimi.it/index.php] Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 9 / 29

The Problem Difficulties with current methods: Sidje, TOMS 1998; Al-Mohy and Higham, SISC 2011 Leading methods for f ( A ) b use Krylov or Taylor methods: “basically” repeated mat-vecs “Small world” property: graph diameter ≤ 4 ⇒ repeated mat-vecs fill in rapidly (see picture) Not designed specifically for sparse networks. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 10 / 29

The Problem Fill-in from repeated matvecs Vectors P k e i for k = 1 , 2 , 3 , 4. n = 1133 Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 11 / 29

The Problem f ( P ) e i is a localized vector 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 200 400 600 800 1000 1200 x-axis: vector index, y-axis: magnitude of entry the column of exp { P } produced by previous slide’s matvecs Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 12 / 29

The Problem Local Method New method: avoid mat-vecs! → use a local method. Local algorithms run in time proportional to size of output: sparse solution vector = small runtime Instead of matvecs, we do specially-selected vector adds using a relaxation method. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 13 / 29

Main results Main Result 1 Theorem 1:[action of f on b ] Given nonnegative A satisfying � A � 1 ≤ 1, with power-law degree distribution and max degree d ; and sparse stochastic b ; our method computes x ≈ f ( A ) b such that � (1 /ε ) C f log(1 /ε ) d 2 log( d ) 2 � � f ( A ) b − x � 1 < ε in work ( ε ) = O , d 2 log( d ) 2 in the graph size “work” “scales as” for any function f that decays “fast enough”. The constant C f depends on how quickly the Taylor coefficients of f decay. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 14 / 29

Main results Main Result 1 Theorem 1:[action of f on b ] Given nonnegative A satisfying � A � 1 ≤ 1, with power-law degree distribution and max degree d ; and sparse stochastic b ; our method computes x ≈ f ( A ) b such that � (1 /ε ) C f log(1 /ε ) d 2 log( d ) 2 � � f ( A ) b − x � 1 < ε in work ( ε ) = O , d 2 log( d ) 2 in the graph size “work” “scales as” for any function f that decays “fast enough”. The constant C f depends on how quickly the Taylor coefficients of f decay. For f ( x ) = (1 − α x ) − 1 , 1 C f = (Note: α ∈ (0 , 1)). 1 − α C f = 3 For f ( x ) = e x , 2 3 p For f ( x ) = x 1 / p , C f = (Note: p ∈ (0 , 1)). 5 p − 1 Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 14 / 29

Main results Main Result 2 Theorem 2:[diffusion of f across a graph] x ≈ ˜ Given column stochastic A and b , ˜ f ( t A ) b can be computed such that � 2 f ( t ) � � ˜ f ( P ) b − ˜ x � ∞ < ε in work ( ε ) = O , ε (Remark: the ‘tilde’ denotes a degree-normalized version for the diffusion: D − 1 exp { t P } b , for example. We normalize by degrees to adjust for the influence of the stationary distribution of P .) Corollary: f ( A ) b is a local vector. Proof: Because sublinear work is done, f ( A ) b cannot have O ( n ) nonzeros. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 15 / 29

Our method: Nexpokit Overview Outline of Nexpokit method (our second method, hk-relax, is related) 1. Express f ( A ) b via a Taylor polynomial 2. Form large linear system out of Taylor terms 3. Use sparse solver to approximate each term’s largest entries 4. Combine approximated terms into a solution Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 16 / 29

Our method: Nexpokit In terms of Taylor terms Taylor polynomial: � f 0 I + f 1 A + f 2 A 2 + f 3 A 3 + · · · + f N A N � f ( A ) b ≈ b Compute terms recursively: v k = f k A k e i = f k f k − 1 A k − 1 � � f k − 1 A e i f k v k = f k − 1 Av k − 1 Then f ( A ) b ≈ v 0 + v 1 + · · · + v N − 1 + v N (But we want to avoid computing v j in full...) Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 17 / 29

Our method: Nexpokit Forming a linear system So we convert the Taylor polynomial into a linear system. For simplicity’s sake, we use the example of exp { A } e i here. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 18 / 29

Our method: Nexpokit Forming a linear system So we convert the Taylor polynomial into a linear system. For simplicity’s sake, we use the example of exp { A } e i here.   I     v 0 e i − A / 1 I v 1 0         ...     0   v 2 − A / 2 =        .   .   ...  . .     . .   I       v N 0 − A / N I where we use the identity v k = 1 k Av k − 1 (which comes from k ! , so f k / f k − 1 = ( k − 1)! f k − 1 Av k − 1 , since f k = 1 f k = 1 v k = k ). k ! Then exp { A } e i ≈ v 0 + v 1 + · · · + v N − 1 + v N Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 18 / 29

A sub-linear method for computing columns of functions of sparse - PowerPoint PPT Presentation

A sub-linear method for computing columns of functions of sparse matrices Kyle Kloster and David F. Gleich Purdue University March 3, 2014 Supported by NSF CAREER 1149756-CCF Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 1 / 29

1 Agenda Agenda Agenda 1 Classical Columns 2 2 Digital Columns Digital Columns 3

Visualizing Model Architecture john.sekar@mssm.edu SASB `17 Kinetics ~ Reaction Rules Enz Sub

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Linear functions A. Functions in general A. Functions in general 1. definition B. Linear

Section 4.1 Dr. Doug Ensley February 9, 2015 Linear Functions A linear functions defining

Newtons method Newtons method 1 / 8 Newtons method Objective: solving a non-linear

The Periodic Table 1 Arranged into Columns called GROUPS or FAMILIES (the columns go up and

Silica Monolith Columns for Silica Monolith Columns for Ultra High Speed Separations Ultra High

BUILDING PAD FOOTINGS 1ST FLOOR COLUMNS SLAB ON GRADE 2ND FLOOR SLAB 2ND FLOOR COLUMNS ROOF

SOLID- -METAL THERMAL COLUMNS METAL THERMAL COLUMNS SOLID IN CONVENTIONAL PCBs IN CONVENTIONAL

Functions Num : N N and Sub : N 3 N , Num( a ) := a , Sub( , x ,

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

More on Functions Thomas Schwarz, SJ Marquette University Functions of Functions Functions

Elementary Functions Part 1, Functions Lecture 1.4a, Symmetries of Functions: Even and Odd

Elementary Functions Part 1, Functions Lecture 1.1b, Functions defined by equations Dr. Ken W.

W4231: Analysis of Algorithms A Network d 11/3/1999 a 4 2 3 1 4 2 Cuts and Flow s c

Dealing with missing values part 2 Applied Multivariate Statistics Spring 2012 Overview

Mining Data that Changes 17 July 2015 Data is Not Static Data is not static New

CSE 421 Introduction to Algorithms The Network Flow Problem 1 The Network Flow Problem 4 a x

More on Graph Rewriting With Contextual Refinement Berthold Hoffmann, Universitt Bremen

Mrs: MapReduce for Scientific Computing in Python Andrew McNabb, Jeff Lund , and Kevin Seppi

TEACHING MATH TE TEACH CHING NG MATH TH IS AS EASY AS Is as easy as 1-2-3 One Rule Two

Attacks in code based cryptography: a survey, new results and open problems J.-P. Tillich Inria,