Fast Incremental von Neumann Graph Entropy Computation: Theory, - PowerPoint PPT Presentation

Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications Pin-Yu Chen IBM Research AI joint work with Lingfei Wu (IBM Research AI) Sijia Liu (IBM Research AI) Indika Rajapakse (Univ. Michigan Ann Arbor) Poster: Tuesday 6:30-9:00 pm, Pacific Ballroom #265 June 10, 2019 P.-Y. Chen ICML 2019 June 10, 2019 1 / 16

Graph as a Data Representation P.-Y. Chen ICML 2019 June 10, 2019 2 / 16

Information-Theoretic Measures between Graphs Structural reducibility of multilayer networks (unsupervised learning) De Domenico et al., ”Structural reducibility of multilayer networks.” Nature Communications 6 (2015). P.-Y. Chen ICML 2019 June 10, 2019 3 / 16

Von Neumann Graph Entropy (VNGE): Introduction Quantum information theory: Φ is a n × n density matrix that is symmetric, positive semidefinite, and trace ( Φ ) = 1 { λ i } n i =1 : eigenvalues of Φ Von Neumann entropy H = − trace ( Φ ln Φ ) = − � i : λ i > 0 λ i ln λ i i =1 , since � → Shannon entropy over eigenspectrum { λ i } n i λ i = 1 ⇒ Generally requires O ( n 3 ) computation complexity for H Graph G = ( V , E , W ) ∈ G : undirected weighted graphs with nonnegative edge weights. G has |V| = n nodes and |E| = m edges. L = D − W : combinatorial graph Laplacian matrix of G . D = diag ( { λ i } ) : diagonal degree matrix. [ W ] ij = w ij : edge weight. Von Neumann graph entropy (VNGE): Φ = L N = c · L , where 1 1 1 c = trace ( L ) = � i ∈V d i = 2 � ( i,j ) ∈E w ij H ≤ ln( n − 1) , “ = ” when G is a complete graph with identical edge weight Braunstein, Samuel L., Sibasish Ghosh, and Simone Severini. ”The Laplacian of a graph as a density matrix: a basic combinatorial approach to separability of mixed states.” Annals of Combinatorics 10.3 (2006): 291-317. Passerini, Filippo, and Simone Severini. ”The von Neumann entropy of networks.” (2008). P.-Y. Chen ICML 2019 June 10, 2019 4 / 16

Von Neumann Graph Entropy (VNGE): Introduction VNGE characterizes structural complexity of a graph and enables computation of Jensen-Shannon distance (JSdist) between graphs. Applications in network learning, computer vision and data science: Structural reducibility of multilayer networks (hierarchical clustering) 1 De Domenico et al., ”Structural reducibility of multilayer networks.” Nature Communications 6 (2015). Depth-analysis for image processing 2 Han, Lin, et al. ”Graph characterizations from von Neumann entropy.” Pattern Recognition Letters 33.15 (2012): 1958-1967. Bai, Lu, and Edwin R. Hancock. ”Depth-based complexity traces of graphs.” Pattern Recognition 47.3 (2014): 1172-1186. Network-ensemble comparison via edge rewiring 3 Li, Zichao, Peter J. Mucha, and Dane Taylor. ”Network-ensemble comparisons with stochastic rewiring and von Neumann entropy.” SIAM Journal on Applied Mathematics, 78(2): 897920 (2018). Structure-function analysis in genetic networks 4 Liu et al., ”Dynamic network analysis of the 4D nucleome.” bioRxiv, pp. 268318 (2018). High consistency with classical Shannon graph entropy that is defined as a probability distribution of a function on subgraphs of G . Anand, Kartik, Ginestra Bianconi, and Simone Severini. ”Shannon and von Neumann entropy of random networks with heterogeneous expected degree.” Physical Review E 83.3 (2011): 036109. Anand, Kartik, and Ginestra Bianconi. ”Entropy measures for networks: Toward an information theory of complex topologies.” Physical Review E 80.4 (2009): 045102. Li, Angsheng, and Yicheng Pan. ”Structural Information and Dynamical Complexity of Networks.” IEEE Transactions on Information Theory 62.6 (2016): 3290-3339. P.-Y. Chen ICML 2019 June 10, 2019 5 / 16

Outline The main challenge of exact VNGE computation: it generally requires cubic complexity O ( n 3 ) for obtaining the full eigenspectrum → NOT scalable to large graphs Our solution: FINGER , a scalable and provably asymptotically correct approximate computation framework of VNGE FINGER supports two different data modes: batch and online (a) Batch mode: O ( n + m ) (b) Online mode: O (∆ n + ∆ m ) New applications: Anomaly detection in evolving Wikipedia hyperlink networks 1 Bifurcation detection of cellular networks during cell reprogramming 2 Synthesized denial of service attack detection in router networks 3 P.-Y. Chen ICML 2019 June 10, 2019 6 / 16

Efficient VNGE Computation via FINGER Recall H = − � n i =1 λ i ln λ i ⇒ O ( n 3 ) cubic complexity FINGER enables fast and incremental computation of H with asymptotic approximation guarantee Lemma (Quadratic approximation of H ) The quadratic approximation of the von Neumann graph entropy H via Taylor expansion is equivalent to Q = 1 − c 2 ( � i + 2 · � i ∈V d 2 ( i,j ) ∈E w 2 ij ) d i : degree (sum of edge weights) of node i w ij : edge weight of edge ( i, j ) 1 c = 2 � ( i,j ) ∈E w ij O ( n + m ) linear complexity. |V| = n , |E| = m . Q can be incremental updated given graph changes ∆ G ⇒ O (∆ n + ∆ m ) complexity P.-Y. Chen ICML 2019 June 10, 2019 7 / 16

Approximate VNGE with Asymptotic Guarantees Let λ max ( λ min ) be the largest (smallest) positive eigenvalue in { λ i } Approx. VNGE for batch graph sequence: � H ( G ) = − Q ln λ max Approx. VNGE for online graph sequence: � H ( G ) = − Q ln(2 c · d max ) Relation: � H ≤ � H ≤ H Theorem ( o (ln n ) approximation error with balanced eigenspectrum) If the number of positive eigenvalues n + = Ω( n ) and λ min = Ω( λ max ) , the scaled approximation error (SAE) H − � ln n → 0 and H − � H H ln n → 0 as n → ∞ . h ( n ) = 0 , and lim sup n →∞ | f ( n ) f ( n ) f ( n ) = o ( h ( n )) and f ( n ) = Ω( h ( n )) mean lim n →∞ h ( n ) | > 0 , respectively. Computing λ max only requires O ( n + m ) operations via power iteration ⇒ O ( n + m ) linear complexity for � H . Theorem (Incremental update of � H with O (∆ n + ∆ m ) complexity) The VNGE � H ( G ⊕ ∆ G ) can be updated by � H ( G ⊕ ∆ G ) = F ( � H ( G ) , ∆ G ) P.-Y. Chen ICML 2019 June 10, 2019 8 / 16

Numerical Validation on Synthetic Random Graphs Erdos-Renyi graphs Watts-Strogatz graphs approx. error 0.2 approx. error 0.08 scaled scaled 0.06 0.1 0.04 0 0.02 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 number of nodes number of nodes reduction ratio (%) computation time computation time reduction ratio (%) 100 100 d = 2 p WS = 0 90 d = 5 p WS = 0 . 1 80 p WS = 0 . 2 d = 10 80 p WS = 0 . 4 d = 20 p WS = 0 . 6 d = 50 70 60 p WS = 0 . 8 1000 2000 3000 4000 5000 d = 100 1000 2000 3000 4000 5000 number of nodes number of nodes p WS = 1 d = 200 Figure: Scaled approximation error (SAE) and computation time reduction ratio scaled approximation error (SAE) = H − H approx ln n Time H − Time H approx computation time reduction ratio = Time H almost 100% speed-up ( O ( n 3 ) v.s. O ( n + m ) ) approximation error decreases as average degree increases regular (random) graphs have smaller (larger) approximation error P.-Y. Chen ICML 2019 June 10, 2019 9 / 16

Jensen-Shannon Distance between Graphs using FINGER Two graphs G and � G of the same node set V . KL divergence D KL ( G | � G ) = trace ( L N ( G ) · [ln L N ( G ) − ln L N ( � G )]) (not symmetric) Let G = G ⊕ � denote the averaged graph of G and � G G , where 2 L N ( G ) = L N ( G )+ L N ( � G ) . 2 The Jensen-Shannon divergence is defined as DIV JS ( G, � G ) = 2 D KL ( G | � 2 D KL ( � 2 [ H ( G ) + H ( � 1 G ) + 1 G | G ) = H ( G ) − 1 G )] (symmetric) G ) = √ DIV JS , The Jensen-Shannon distance is defined as JSdist ( G, � which is proved to be a valid distance metric. Briet, Jop, and Peter Harremos. ”Properties of classical and quantum Jensen-Shannon divergence.” Physical review A 79.5 (2009): 052311. P.-Y. Chen ICML 2019 June 10, 2019 10 / 16

FINGER Algorithms for Jensen-Shannon Distance Jensen-Shannon distance computation via FINGER- � H (batch mode): Input: Two graphs G and � G Output: JSdist( G, � G ) 1. Obtain G = G ⊕ � and compute � H ( G ) , � H ( � G ) , and � G H ( G ) via 2 FINGER (Fast) 2. JSdist( G, � G ) = � H ( G ) − 1 2 [ � H ( G ) + � H ( � G )] ⇒ O ( n + m ) complexity inherited from � H Jensen-Shannon distance computation via FINGER- � H (online mode): Input: Graph G and its changes ∆ G , Approx VNGE � H ( G ) of G Output: JSdist( G, G ⊕ ∆ G ) 1. compute � H ( G ⊕ ∆ G 2 ) and � H ( G ⊕ ∆ G ) via FINGER (Inc.) 2. JSdist( G, G ⊕ ∆ G ) = � 2 [ � H ( G ) + � H ( G ⊕ ∆ G 2 ) − 1 H ( G ⊕ ∆ G )] ⇒ O (∆ n + ∆ m ) complexity inherited from � H √ o ( ln n ) approximation guarantee of JSdist via FINGER (see paper) P.-Y. Chen ICML 2019 June 10, 2019 11 / 16

Fast Incremental von Neumann Graph Entropy Computation: Theory, - PowerPoint PPT Presentation

Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications Pin-Yu Chen IBM Research AI joint work with Lingfei Wu (IBM Research AI) Sijia Liu (IBM Research AI) Indika Rajapakse (Univ. Michigan Ann Arbor)

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

The von Neumann Architecture The von Neumann Architecture of Computer Systems of Computer

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Computer Architecture Review CS 562 1 The von Neumann Model John von Neumann (1946)

Set Theory and von Neumann algebras Rom an Sasyk ENS Lyon & Universidad de Buenos Aires

von Neumann's bottleneck von Neumann machine One control unit that connects memory and

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

Orc David Schleef Entropy Wave Inc (c) 2009 Entropy Wave Inc What is Orc A system for

Incremental Interactive Computation Matthew Hammer hammer@cs.umd.edu Tuesday, October 7, 2014

Ozawas class S for locally compact groups and unique prime factorization of group von Neumann

von Neumann von Neumann vs. Harvard von Neumann Same memory holds data, instructions.

Von Neumann algebras, countable groups and ergodic theory Workshop Young Researchers in

G odel, Von Neumann and the origins of theoretical computer science Alasdair Urquhart

Lecture 4: Von Neumann algebraic Hardy spaces David Blecher University of Houston December 2016

Representation Learning Lecture slides for Chapter 15 of Deep Learning www.deeplearningbook.org

Continuous models of computation: computability, complexity, universality Amaury Pouly Joint

AN I NT RODUCT I ON T O POGI L T a ra Y. Me ye r, PhD Se a n Ga rre tt-Ro e , PhD Pro fe

2008 Nobel Prize in Chemistry: GFP Osamu Shimomura (Woods Hole, & Boston U) GFP from Aequorea

Lenalidomide meccanismo dazione: tutto chiaro? Romano Danesi Farmacologia clinica e

Extending ensembldb : MySQL backend and protein annotations Johannes Rainer (EURAC research,

RJaCGH, a package for analysis of cancer activity. CGH arrays with Reversible Jump MCMC

Tutorial Slides for Week 8 ENEL 353: Digital Circuits Fall 2015 Term Steve Norman, PhD, PEng

Fast Incremental von Neumann Graph Entropy Computation: Theory, - PowerPoint PPT Presentation

Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications Pin-Yu Chen IBM Research AI joint work with Lingfei Wu (IBM Research AI) Sijia Liu (IBM Research AI) Indika Rajapakse (Univ. Michigan Ann Arbor)

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

The von Neumann Architecture The von Neumann Architecture of Computer Systems of Computer

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Computer Architecture Review CS 562 1 The von Neumann Model John von Neumann (1946)

Set Theory and von Neumann algebras Rom an Sasyk ENS Lyon &amp; Universidad de Buenos Aires

von Neumann's bottleneck von Neumann machine One control unit that connects memory and

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

Orc David Schleef Entropy Wave Inc (c) 2009 Entropy Wave Inc What is Orc A system for

Incremental Interactive Computation Matthew Hammer hammer@cs.umd.edu Tuesday, October 7, 2014

Ozawas class S for locally compact groups and unique prime factorization of group von Neumann

von Neumann von Neumann vs. Harvard von Neumann Same memory holds data, instructions.

Von Neumann algebras, countable groups and ergodic theory Workshop Young Researchers in

G odel, Von Neumann and the origins of theoretical computer science Alasdair Urquhart

Lecture 4: Von Neumann algebraic Hardy spaces David Blecher University of Houston December 2016

Representation Learning Lecture slides for Chapter 15 of Deep Learning www.deeplearningbook.org

Continuous models of computation: computability, complexity, universality Amaury Pouly Joint

AN I NT RODUCT I ON T O POGI L T a ra Y. Me ye r, PhD Se a n Ga rre tt-Ro e , PhD Pro fe

2008 Nobel Prize in Chemistry: GFP Osamu Shimomura (Woods Hole, &amp; Boston U) GFP from Aequorea

Lenalidomide meccanismo dazione: tutto chiaro? Romano Danesi Farmacologia clinica e

Extending ensembldb : MySQL backend and protein annotations Johannes Rainer (EURAC research,

RJaCGH, a package for analysis of cancer activity. CGH arrays with Reversible Jump MCMC

Tutorial Slides for Week 8 ENEL 353: Digital Circuits Fall 2015 Term Steve Norman, PhD, PEng

Set Theory and von Neumann algebras Rom an Sasyk ENS Lyon & Universidad de Buenos Aires

2008 Nobel Prize in Chemistry: GFP Osamu Shimomura (Woods Hole, & Boston U) GFP from Aequorea