Kernels and Regularization on Discrete Domains Alexander J. Smola - - PowerPoint PPT Presentation

kernels and regularization on discrete domains
SMART_READER_LITE
LIVE PREVIEW

Kernels and Regularization on Discrete Domains Alexander J. Smola - - PowerPoint PPT Presentation

http://alex.smola.org/talks/coltgraph2003.pdf Kernels and Regularization on Discrete Domains Alexander J. Smola and Risi I. Kondor Alex.Smola@anu.edu.au and risi@cs.columbia.edu Machine Learning Program Australian National University and


slide-1
SLIDE 1

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 1

Kernels and Regularization on Discrete Domains

Alexander J. Smola and Risi I. Kondor Alex.Smola@anu.edu.au and risi@cs.columbia.edu Machine Learning Program Australian National University and National ICT Australia Department of Computer Science Columbia University

http://alex.smola.org/talks/coltgraph2003.pdf

slide-2
SLIDE 2

Outline

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 2

Learning Problem The Graph Laplacian Definition and Properties Invariance Theorem Regularization and Greens Functions on Graphs Regularization by the Graph Laplacian Kernels Connections to Clustering Approximate and Fast Computation Products of Graphs Iterative Expansions and Polynomial Approximation Summary and Outlook

slide-3
SLIDE 3

Learning Problem

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 3

Estimation Problem Given some observations (xi, yi) ∈ X × Y, find estimator f : X → Y which minimizes some cost of misprediction. Specifically, f is a member of a Reproducing Kernel Hilbert Space, so we need a kernel k(x, x′). Unreal Data: Discrete data categorical variables, e.g. (English, high school, butcher, unemployed) Similarity between pairs of observations, e.g. set of k- nearest neighbours. Web pages Regulatory networks Problem We need a measure of smoothness on functions f, de- fined on X, where X is a discrete set.

slide-4
SLIDE 4

Graphs

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 4

Graph Define G(V, E) as a set of vertices V and edges E. Connectivity Matrix W ∈ R|V |2 where Wij = 1 if i, j share an edge and 0 other- wise. Also W ∈ [0, ∞). Random Walk From vertex i to j with probability p(j|i) = Wij

  • l Wil

= Wij Dii

slide-5
SLIDE 5

Graph Laplacian

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 5

Smoothness on Graph A possible criterion for smooth functions is that variations between adjacent values should be small:

  • i∼j

(fi − fj)2 = 2

  • i

f 2

i Di − 2

  • i∼j

fifj = 2f ⊤(D − W)

  • :=L

f. where Di =

j Wij is the diagonal normalization.

Special Case: Lattice in 2D

  • For regular lattices, −L is the

discretization of the continu-

  • us Laplace operator

∆ =

i ∂2 xi.

Normalized Graph Laplacian We rescale L by D to obtain ˜ L := 1 − D−1

2WD−1 2.

Note that 1 1

2 ˜

L 0.

slide-6
SLIDE 6

Invariance Theorem

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 6

Theorem Denote by L ∈ Rn2 a symmetric matrix, given as a lin- ear permutation invariant function of the adjacency ma- trix W, i.e. L = T[W] with Π⊤

π T[W]Ππ = T

  • Π⊤

π WΠπ

  • for all π ∈ Sn

Then L is related to W by a linear combination of the following operations: identity row/column sums and overall sum row/column sum restricted to the diagonal of L Consequence This essentially only leaves the (normalized) graph

  • Laplacian. An analogous result exists for the Laplace

Operator in Rn with respect to the Galilei group.

slide-7
SLIDE 7

Proof Idea

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 7

Specifying the Operator T Li1i2 = T[W]i1i2 :=

n

  • i3,i4

T i1i2i3i4 Wi3i4 Permutation invariance implies Tπ(i1)π(i2)π(i3)π(i4) = Ti1i2i3i4for any π ∈ Sn. Picking matching terms For every matching set of indices, the corresponding en- tries in the tensor T have to agree, e.g. if the first and second index (i1 = i2) in T agree, then they also agree in Tπ(i1)π(i2)π(i3)π(i4), that is π(i1) = π(i2). Matching Interpret the remaining terms of T as per theorem.

slide-8
SLIDE 8

Regularization and Kernels

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 8

Regularization on f Given f ∈ Rn we want some matrix M 0 to define the regularizer f ⊤Mf. Self-Consistency Condition In an RKHS we have the condition k(x, ·), Mk(x′, ·) = k(x, x′) In matrix notation this can be rewritten as KMK = K and therefore K = M † Here M † is the pseudoinverse of M. “Kernel Expansion” For the expansion f = Kα we have f ⊤Mf = α⊤KMKα = α⊤Kα

slide-9
SLIDE 9

Using the Laplacian

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 9

Designing M from L, ˜ L We want to penalize quickly varying functions on the graph more severely. The eigensystem of L or ˜ L is a good guess for that. Eigenvectors with small eigenvalues split the graph into large coherent clusters (e.g. Fiedler vector). Analogy from Regularization with Laplace Operators f, Mf =

  • f, exp

σ2 2 ∆

  • f
  • yields Gaussian kernels k(x, x′) = exp(− 1

2σ2x − x′2).

f, Mf = f, exp (σL) f yields Diffusion kernels K = exp(−σL).

slide-10
SLIDE 10

Eigenvalue Remapping

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 10

General Connection Use monotonic r(λ) to define M = r(L). K is then given by K = r−1(L). Big Gain: r−1(λ) may be cheap. Examples of r(λ) r(λ) = 1 + σλ (Regularized Laplacian) r(λ) = exp (σλ) (Diffusion Process) r(λ) = (a1 − λ)−p with a ≥ 2 (p-Step Random Walk) r(λ) = (cos λπ/4)−1 (Inverse Cosine) Examples of K = r−1(L) K = (1 + σL)−1 (Regularized Laplacian) K = ( 1 1⊤ + σL)−1 (“Google)” K = exp(−σL) (Diffusion Process) K = (a1 − L)p with a ≥ 2 (p-Step Random Walk)

slide-11
SLIDE 11

Examples

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 11

Regularized Graph Laplacian Diffusion kernel with σ = 5 4-step Random Walk

slide-12
SLIDE 12

Connections to Clustering

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 12

Eigenvectors In spectral clustering one decomposes G(V, E) accord- ing to the smallest eigenvectors of the Graph Laplacian: Small eigenvalues/eigenvectors correspond to large coherent parts of the graph. Large eigenvalues/eigenvectors yield incoherent com- ponents. Kernels The order of the eigenvalues is reversed. So Kernel- PCA on a Graph-Kernel finds small eigenvectors of the Graph Laplacian.

slide-13
SLIDE 13

Two Moons

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 13

slide-14
SLIDE 14

Nearest Neighbor Graph

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 14

slide-15
SLIDE 15

Inverse Graph Laplacian

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 15

slide-16
SLIDE 16

Products of Graphs

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 16

Motivation Often graphs are composed of simple parts, e.g. as products of simpler graphs. Example: hypercubes. Goal Compute K without paying the price of the larger graph. Spectral Properties For regular graphs, we can simply multiply the eigen- values of the factors of the graph. λfact

j,l =

d d + d′λj + d′ d + d′λ′

l

Likewise, the eigenvectors are the cartesian product of the eigenvectors of the factors: ej,l

(i,i′) = ej ie′l i′

slide-17
SLIDE 17

Product Tricks

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 17

Analytic Expressions exp(−β(a + b)) = exp(−βa) exp(−βb) (A − (a + b))p =

p

  • n=0

p n A 2 − a n A 2 − b p−n So for diffusion processes we can simply take the prod- uct of the kernels over the factors. Brute Force Theorem If we can solve parts more cheaply, we can compute the overall kernel by K(j,j′),(l,l′) = 1 2πi

  • C

Kα(j, l)G′

−α(j′, l′)dα =

  • v

Kλv(j, l)ev

j′ev l′

Open Problem What to do if we do not have regular graphs.

slide-18
SLIDE 18

Outlook and Summary

Alexander J. Smola and Risi I. Kondor: Kernels and Regularization on Discrete Domains, Page 18

What we have Extending regularization operators to discrete domain. Connections to spectral clustering. Extensions of the diffusion kernel setting. To Do Extensions of the regularization framework to directed graphs (e.g. for Smola & Vishwanathan). Stability results for vertex/edge removal. Approximate computation for large graphs and scale- free networks. We are hiring. For details see www.nicta.com.au or Alex.Smola@anu.edu.au