Facebook Friends and Matrix Functions Graduate Research Day Joint - PowerPoint PPT Presentation

� Facebook Friends � and Matrix Functions Graduate Research Day Joint with Kyle Kloster � David F David F. . Gleich Gleich, (Purdue), supported by � Purdue University � NSF CAREER 1149756-CCF

Network Analysis Use linear algebra to study graphs Graph

Network Analysis Use linear algebra to study graphs Graph, G V , vertices (nodes) E , edges (links) degree of a node = # edges incident to it. nodes sharing an edge are neighbors .

Network Analysis Erd ő s Number Use linear algebra to study graphs Facebook friends Twitter followers Graph, G Search engines Amazon/Netflix rec. V , vertices (nodes) Protein interactions E , edges (links) Power grids Google Maps Air traffic control Sports rankings Cell tower placement Scheduling Parallel programming Everything Kevin Bacon

Network Analysis Use linear algebra to study graphs Graph Properties Diameter Is everything just a few hops away from everything else?

Network Analysis Use linear algebra to study graphs Graph Properties Diameter Clustering Are there tightly-knit groups of nodes?

Network Analysis Use linear algebra to study graphs Graph Properties Diameter Clustering How well can each Connectivity node reach every other node?

Network Analysis Use linear algebra to study graphs Graph Properties Linear Algebra Diameter Eigenvalues and matrix Clustering functions shed light Connectivity on all these questions. These tools require a matrix related to the graph…

Graph Matrices Adjacency matrix, A 1, if nodes i, j share an edge (are adjacent) A ij = 0 otherwise Random-walk transition matrix, P P ij = A ij / d j d j where is the degree of node j. Stochastic! i.e. column-sums = 1

Network analysis via Heat Kernel Local clustering Uses include Link prediction Node centrality Heat kernel is… ∞ a graph diffusion k ! G k X 1 a function of a matrix exp ( G ) = k =0 random-walk, P For G , a network’s matrix adjacency, A Laplacian, L

Heat Kernel describes node connectivity ( A k ) ij = # walks of length k from node i to j “sum up” the walks ∞ between i and j k ! ( A k ) ij X 1 exp ( A ) ij = k =0 For a small set of seed nodes, s , exp ( A ) s describes nodes most relevant to s

Diffusion score “diffusion scores” of a graph = � weighted sum of probability vectors p 3 + … + + p 0 + p 1 p 2 c 3 c 1 c 2 c 0 diffusion score vector = f � random-walk P = transition matrix ∞ normalized X c k P k s f = s = seed vector weight on k =0 c k = stage k

Heat Kernel vs. PageRank Diffusions Heat Kernel uses t k /k! � t 3 t 1 t 2 + p 3 + … + t 0 p 0 + p 1 p 2 Our work is new analysis and 3! 1! 2! 0! algorithms for this diffusion. PageRank uses 𝛽 k at stage k. � + … + p 3 p 0 + p 1 + p 2 𝛽 3 𝛽 1 𝛽 2 𝛽 0 Standard, widely-used diffusion we use for comparison. Linchpin of Google’s original success!

Heat Kernel vs. PageRank Theory good fast clusters algorithm Local Cheeger Inequality: � existing constant-time PR “PR finds near-optimal algorithm clusters” [Andersen Chung Lang 06] HK

Heat Kernel vs. PageRank Theory good fast clusters algorithm Local Cheeger Inequality: � existing constant-time PR “PR finds near-optimal algorithm clusters” [Andersen Chung Lang 06] Local Cheeger Inequality HK [Chung 07]

Heat Kernel vs. PageRank Theory good fast clusters algorithm Local Cheeger Inequality: � existing constant-time PR “PR finds near-optimal algorithm clusters” [Andersen Chung Lang 06] Local Cheeger Inequality Our work � HK [Chung 07]

Algorithm outline x ≈ exp ( P ) s ˆ (1) Approximate with a polynomial (2) Convert to linear system (Details in paper) (3) Solve with sparse linear solver

Algorithm outline x ≈ exp ( P ) s ˆ (1) Approximate with a polynomial (2) Convert to linear system (Details in paper) (3) Solve with sparse linear solver Gauss-Southwell A x ( k ) ≈ b Sparse solver r ( k ) := b − A x ( k ) “relax” largest x ( k +1) := x ( k ) + A r ( k ) entry in r big

Algorithm outline x ≈ exp ( P ) s ˆ (1) Approximate with a polynomial (2) Convert to linear system (Details in paper) (3) Solve with sparse linear solver Key: We avoid doing these full matrix-vector products N k ! P k s X 1 exp ( P ) s ≈ k =0

Algorithm outline x ≈ exp ( P ) s ˆ (1) Approximate with a polynomial (2) Convert to linear system (Details in paper) (3) Solve with sparse linear solver (All my work was Key: We avoid doing these showing this full matrix-vector actually can products be done N with bounded k ! P k s X 1 exp ( P ) s ≈ error.) k =0

Algorithms & Theory for x ≈ exp ( P ) s ˆ Algorithm 1, Weak Convergence O( e 1 ˜ - constant time on any graph, ε ) - outperforms PageRank in clustering - accuracy: k D − 1 x � D − 1 ˆ x k ∞ < ε

Algorithms & Theory for x ≈ exp ( P ) s ˆ k D − 1 x � D − 1 ˆ x k ∞ < ε Conceptually Diffusion vector quantifies node’s connection to each other node. Divide each node’s score by its degree, delete the nodes with score < ε . Only a constant number of nodes remain in G! Users spend “reciprocated time” with O(1) others.

Algorithms & Theory for x ≈ exp ( P ) s ˆ Algorithm 2, Global Convergence (conditional)

Power-law Degrees Realworld graphs have degrees distributed as follows. This causes diffusions to be localized . 1e+07 1e+06 Power-law degrees 100000 Degrees of nodes 10000 rank in Ljournal-2008 1000 Log-log scale 100 10 1 0 1 10 100 1000 10000 indegree [Boldi et al., Laboratory for Web Algorithmics 2008]

Local solutions Magnitude of entries Accuracy of approximation in solution vector using only large entries 1.5 0 0 10 10 − 2 − 2 10 10 1 magnitude − 4 − 4 10 10 1 − norm error 1 − norm error − 6 − 6 10 10 0.5 − 8 − 8 10 10 − 10 − 10 10 10 − 12 − 12 0 10 10 1 2 3 4 5 − 14 − 14 nnz = 4815948 6 10 10 x 10 ∞ 0 0 1 1 2 2 3 3 4 4 5 5 6 6 k ! A k s 10 10 10 10 10 10 10 10 10 10 10 10 10 10 X 1 has ~5 million nnz! largest non − zeros retained largest non − zeros retained k =0

Local solutions Magnitude of entries Accuracy of approximation in solution vector using only large entries 1.5 0 0 10 10 − 2 − 2 10 10 1 magnitude − 4 − 4 10 10 1 − norm error 1 − norm error − 6 − 6 10 10 Only ~3,000 entries 0.5 − 8 − 8 10 10 − 10 − 10 10 10 For 10 -4 accuracy! − 12 − 12 0 10 10 1 2 3 4 5 − 14 − 14 nnz = 4815948 6 10 10 x 10 ∞ 0 0 1 1 2 2 3 3 4 4 5 5 6 6 k ! A k s 10 10 10 10 10 10 10 10 10 10 10 10 10 10 X 1 has ~5 million nnz! largest non − zeros retained largest non − zeros retained k =0

Algorithms & Theory for x ≈ exp ( P ) s ˆ Algorithm 2, Global Convergence (conditional) - sublinear (power-law) O ( d log d (1 / ε ) C ) ˜ - accuracy: k x � ˆ x k 1 < ε

Algorithms & Theory for x ≈ exp ( P ) s ˆ k x � ˆ x k 1 < ε Conceptually A node’s diffusion vector can be approximated with total error < ε using only O(d log d) entries. In realworld networks (i.e. with degrees following a power-law), no node will have nontrivial connection with more than O(d log d) other nodes.

Experiments

Runtime on the web-graph A particularly sparse graph benefits us best 140 EXMPV |V| = O(10^8) 120 GSQ |E| = O(10^9) GS 100 Time (sec) 80 60 GSQ, GS: our methods EXPMV: MatLab 40 20 0 0 10 20 30 Trial

� Thank you Local clustering via heat kernel code available at http://www.cs.purdue.edu/homes/dgleich/codes/hkgrow Global heat kernel code available at http://www.cs.purdue.edu/homes/dgleich/codes/nexpokit/ Questions or suggestions? Email Kyle Kloster at kkloste-at-purdue-dot-edu

Facebook Friends and Matrix Functions Graduate Research Day Joint - PowerPoint PPT Presentation

Facebook Friends and Matrix Functions Graduate Research Day Joint with Kyle Kloster David F David F. . Gleich Gleich, (Purdue), supported by Purdue University NSF CAREER 1149756-CCF Network Analysis Use linear algebra to study

Facebook Exchange Facebook Exchange (FBX) (FBX) Facebook Exchange The Facebook Exchange allows

Facebook Strategies Facebook www.facebook.com Facebook TIPS Idea #1: Share the School Calendar.

There is nothing wrong with having friends! There is nothing wrong with having friends.

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

MySQL Replication and HA at Facebook Part-II Jeff Jiang Production Engineer Facebook, Inc

GETTING STARTED WITH FACEBOOK ADVERTISING 1.Facebook Ads Growth 2.Why theyre popular

Introducing Live for Facebook Available Now (beta) Coming Soon Available On Facebook Mentions

One Trillion Edges: Graph Processing at Facebook-Scale GraphHPC 2015, Moscow Avery Ching Sergey

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Summer Federation of Friends Meeting August 8, 2017 A discussion on Friends Group Project

Tea with the Friends Tea wi with the F Friends Who are the Friends? We are a Voluntary

Friends of the Helsingborg Symphony Orchestra (HSV) Friends of the Helsingborg Symphony Orchestra

Enterprise IPv6 Transition Matrix IETF 60 IPv6 Operations Working Group Aug 2-6, 2004 San

Transition Domains Alignment to Nebraska Agency Supports and ESU 13 Transition XXXXXX High

Markov Chains 3 between states, and an initial distribution . State: where are you now? CS70

1 Objectives Analysis Overview Investigate effect of packet marking function on its

Advanced Algorithms (XIV) Shanghai Jiao Tong University Chihao Zhang June 8, 2020 Mixing Time

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 7: Hidden

MSc in Computer Engineering, Cybersecurity and Artificial Intelligence, Fault Diagnosis and

AE3M33MKR Kalman Filter Ing. Karel Ko snar PhD., RNDr. Miroslav Kulich, Ph.D., Dr. el