Random Walks, Random Fields, and Graph Kernels John Lafferty - PowerPoint PPT Presentation

Random Walks, Random Fields, and Graph Kernels John Lafferty School of Computer Science Carnegie Mellon University Based on work with Avrim Blum, Zoubin Ghahramani, Risi Kondor Mugizi Rwebangira, Jerry Zhu

Outline Graph Kernels − − − → Random Fields �     � Random Walks ← − − − Continuous Fields 1

Using a Kernel f ( x ) = � N f ( x ) = � N ˆ ˆ i =1 α i y i � x, x i � i =1 α i y i K ( x, x i ) 2

The Kernel Trick K ( x, x ′ ) positive semidefinite: � � f ( x ) f ( x ′ ) K ( x, x ′ ) dx ′ dx ≥ 0 X X Taking feature space of functions F = { Φ( x ) = K ( · , x ) , x ∈ X} , has “reproducing property” g ( x ) = � K ( · , x ) , g � . � Φ( x ) , Φ( x ′ ) � = � K ( · , x ) , K ( · , x ′ ) � = K ( x, x ′ ) 3

Structured Data What if data lies on a graph or other data structure? S Google CMU Cornell N VP time foobar.com NSF 4

✆✝ ☎ ✝ ✆ ✆ ☎ ✄ ✄ ✄ ☎ Combinatorial Laplacian ✂✁✂ �✁� �✁� ✂✁✂ �✁� ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✠✁✠✁✠✁✠✁✠ ✡✁✡✁✡✁✡✁✡ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✠✁✠✁✠✁✠✁✠ ✡✁✡✁✡✁✡✁✡ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✠✁✠✁✠✁✠✁✠ ✡✁✡✁✡✁✡✁✡ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✡✁✡✁✡✁✡✁✡ ✠✁✠✁✠✁✠✁✠ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✠✁✠✁✠✁✠✁✠ ✡✁✡✁✡✁✡✁✡ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✡✁✡✁✡✁✡✁✡ ✠✁✠✁✠✁✠✁✠ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✠✁✠✁✠✁✠✁✠ ✡✁✡✁✡✁✡✁✡ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✠✁✠✁✠✁✠✁✠ ✡✁✡✁✡✁✡✁✡ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✡✁✡✁✡✁✡✁✡ ✠✁✠✁✠✁✠✁✠ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✠✁✠✁✠✁✠✁✠ ✡✁✡✁✡✁✡✁✡ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✡✁✡✁✡✁✡✁✡ ✠✁✠✁✠✁✠✁✠ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✠✁✠✁✠✁✠✁✠ ✡✁✡✁✡✁✡✁✡ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✡✁✡✁✡✁✡✁✡ ✠✁✠✁✠✁✠✁✠ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✠✁✠✁✠✁✠✁✠ ✡✁✡✁✡✁✡✁✡ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✠✁✠✁✠✁✠✁✠ ✡✁✡✁✡✁✡✁✡ ✞✁✞✁✞✁✞✁✞ ✟✁✟✁✟✁✟✁✟ ✠✁✠✁✠✁✠✁✠ ✡✁✡✁✡✁✡✁✡ Think of edge e as “tangent vector” at e − . For f : V − → R , d f : E − → R is the 1-form f ( e ) = f ( e + ) − f ( e − ) d Then ∆ = d ∗ d (as matrix) is discrete analogue of div ◦ ∇ 5

Combinatorial Laplacian It is an averaging operator � w xy ( f ( x ) − f ( y )) ∆ f ( x ) = y ∼ x � = d ( x ) f ( x ) − w xy f ( y ) x ∼ y We say f is harmonic if ∆ f = 0 . Since � f, ∆ g � = � d f, dg � , ∆ is self-adjoint and positive. 6

Diffusion Kernels on Graphs (Kondor and L., 2002) If ∆ is the graph Laplacian, in analogy with the continuous setting, ∂ ∂tK t = ∆ K t is the heat equation on a graph. Solution K t = e t ∆ is the diffusion kernel . 7

Physical Interpretation � � ∆ − ∂ K = 0 , initial condition δ x ( y ) : ∂t � e t ∆ f ( x ) = M K t ( x, y ) f ( y ) dy For a kernel-based classifier � y ( x ) = ˆ α i y i K t ( x i , x ) i decision function is given by heat flow with initial condition   x = x i ∈ positive labeled data α i  − α i x = x i ∈ negative labeled data f ( x ) =   0 otherwise 8

RKHS Representation General spectral representation of a kernel as K ( x, y ) = � n i =1 λ i φ i ( x ) φ i ( y ) leads to reproducing kernel Hilbert space �� a i b i = a i φ i , b i φ i λ i i i i H K For the diffusion kernel, RKHS inner product is � e tµ i � � f, g � H K = f i � g i i Interpretation: Functions with small norm don’t “oscillate” rapidly on the graph. 9

Building Up Kernels If K ( i ) are kernels on X i t i =1 K ( i ) K t = ⊗ n is a kernel on X 1 × . . . × X n . t For the hypercube: Hamming distance � �� d ( x, x ′ ) K t ( x, x ′ ) ∝ (tanh t ) Similar kernels apply to standard categorical data. Other graphs with explicit diffusion kernels: • Infinite trees (Chung & Yau, 1999) • Cycles • Rooted trees • Strings with wildcards 10

Results on UCI Datasets Hamming Diffusion Kernel Improv. | SV | | SV | ∆ | SV | Data Set error error ∆ err β 7.64% 387.0 3.64% 62.9 0.30 62% 83% Breast Cancer 17.98% 750.0 17.66% 314.9 1.50 2% 58% Hepatitis 19.19% 1149.5 18.50% 1033.4 0.40 4% 8% Income 3.36% 96.3 0.75% 28.2 0.10 77% 70% Mushroom 4.69% 286.0 3.91% 252.9 2.00 17% 12% Votes Recent application to protein classification by Vert and Kanehisa (NIPS 2002). 11

Random Fields View of Combining Labeled/Unlabeled Data 12

Random Fields View View each vertex x as having label f ( x ) ∈ { +1 , − 1 } . Ising model on graph/lattice, spins f : V − → { +1 , − 1 } � 1 w xy ( f ( x ) − f ( y )) 2 Energy H ( f ) = 2 x ∼ y � ≡ − w xy f ( x ) f ( y ) x ∼ y 1 β = 1 Z ( β ) e − βH ( f ) Gibbs distribution P ( f ) = T � e − βH ( f ) Partition function Z ( β ) = f 13

Graph Mincuts Graph mincuts can be very unbalanced 1 1 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0 Graph mincuts don’t exploit probabilistic properties of random fields Idea: Replace by averages under Ising model � f ( x ) e − βH ( f ) E β [ f ( x )] = Z ( β ) f | ∂S = f B 14

Pinned Ising Model β =3 β =2 1 1 0.5 0.5 0 0 0 5 10 15 0 5 10 15 β =1.5 β =1 1 1 0.5 0.5 0 0 0 5 10 15 0 5 10 15 β =0.75 β =0.1 1 1 0.5 0.5 0 0 0 5 10 15 0 5 10 15 15

Not (Provably) Efficient to Approximate Unfortunately, analogue of rapid mixing result of Jerrum & Sinclair for ferromagnetic Ising model not known for mixed boundary conditions Question: Can we compute averages using graph algorithms in the zero temperature limit? 16

Idea: “Relax” to Statistical Field Theory Euclidean field theory on graph/lattice, fields f : V − → R � 1 w xy ( f ( x ) − f ( y )) 2 Energy H ( f ) = 2 x ∼ y 1 β = 1 Z ( β ) e − βH ( f ) Gibbs distribution P ( f ) = T � e − βH ( f ) d Partition function Z ( β ) = f f Physical Interpretation: analytic continuation to imaginary time, t �→ it Poincar´ e group �→ Euclidean group. 17

View from Statistical Field Theory (cont.) Most probable field is harmonic Weighted graph G = ( V, E ) , edge weights w xy , combinatorial Laplacian ∆ . Subgraph S with boundary ∂S . Dirichlet Problem: unique solution ∆ f = 0 on S f | ∂S = f B 18

Random Walk Solution Perform random walk on unlabeled data, stop when hit a labeled point. What is the probability of hitting a positive labeled point before a negative labeled point? Precisely the same as minimum energy (continuous) random field. Label Propagation . Related work by Szummer and Jaakkola (NIPS 2001) 19

Unconstrained Constrained 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 −0.2 −0.2 −0.4 −0.4 −0.6 −0.6 −0.8 −0.8 −1 −1 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 30 30 25 25 30 30 20 20 25 25 15 20 15 20 15 15 10 10 10 10 5 5 5 5 20

View from Statistical Field Theory In one-dimensional case: low temperature limit of average Ising model is the same is minimum energy Euclidean field. (Landau) Intuition: average over graph s-t mincuts; harmonic solution is linear. Not true in general... 21

Computing the Partition Function Let λ i be spectrum of ∆ , Dirichlet boundary conditions: e − βH ( f ∗ ) n � ( βπ ) n/ 2 √ Z ( β ) = det ∆ = λ i det ∆ i =1 By generalization of matrix-tree (Chung & Langlands,’96) # rooted spanning forests det ∆ = � i deg ( i ) 22

Connection with Diffusion Kernels Again take ∆ , combinatorial Laplacian with Dirichlet boundary conditions (zero on labeled data) � ∞ For K t = e t ∆ diffusion kernel let K = 0 K t dt Solution to the Dirichlet problem (label prop, minimum energy continuous field): � f ∗ ( x ) = K ( x, z ) f D ( z ) z ∈ “fringe” 23

Connection with Diffusion Kernels (cont.) Want to solve Laplace’s equation: ∆ f = g . Solution given in terms of ∆ − 1 . Quick way to see connection using spectral representation: � µ i φ i ( x ) φ i ( x ′ ) ∆ x,x ′ = i � e − tµ i φ i ( x ) φ i ( x ′ ) K t ( x, x ′ ) = i � ∞ � 1 ∆ − 1 φ i ( x ) φ i ( x ′ ) = K t ( x, x ′ ) dt = x,x ′ µ i 0 i Used by Chung and Yau (2000). 24

Random Walks, Random Fields, and Graph Kernels John Lafferty - PowerPoint PPT Presentation

Random Walks, Random Fields, and Graph Kernels John Lafferty School of Computer Science Carnegie Mellon University Based on work with Avrim Blum, Zoubin Ghahramani, Risi Kondor Mugizi Rwebangira, Jerry Zhu Outline Graph Kernels

Random Walks on Graphs Larry Fenn DATE Larry Fenn Random Walks on Graphs Introduction

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Back to Random Walks on Graphs Random walk on a graph: Stationary distribution: Back to Random

18.175: Lecture 23 Random walks Scott Sheffield MIT 18.175 Lecture 23 1 Outline Random walks

Outline Mechanisms Mechanisms Mechanisms for Generating Random Walks Random Walks Power-Law

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Modelling covariance kernels for nonstationary random fields Christopher G. Small University of

Visualization Visualization Height Fields and Contours Height Fields and Contours Scalar Fields

Conditional quenched CLTs for random walks among random conductances Christophe Gallesco Nina

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Random Walks in Two Dimensions Leena Salmela January 31st, 2006 January 31st, 2006 Leena

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8 Markov Random Fields

Understanding Text with Knowledge-Bases and Random Walks Eneko Agirre ixa2.si.ehu.es/eneko IXA

Non-homogeneous random walks Ostap Hryniv Department of Mathematical Sciences Durham University

Choosing Between Several Queuing Policies pierre.douillet@ensait.fr cole Nationale Suprieure

Random Walks Conditioned to Stay Positive Bob Keener Let S n be a random walk formed by summing

Random Walks and Electric Resistance on Distance-Regular Graphs Greg Markowsky March 16, 2011

using random walks to detect amenability in f Murray Elder, Andrew Rechnitzer, Buks van Rensburg,

Representation Learning on Networks Yuxiao Dong Microsoft Research, Redmond Joint work with

Two-dimensional self-avoiding walks Mireille Bousquet-Mlou CNRS, LaBRI, Bordeaux, France

spectral graph theory and clustering linear algebra reminder Real symmetric matrices have real