GP regression on random graphs: Covariance functions and Bayes - - PowerPoint PPT Presentation

gp regression on random graphs covariance functions and
SMART_READER_LITE
LIVE PREVIEW

GP regression on random graphs: Covariance functions and Bayes - - PowerPoint PPT Presentation

Motivation Covariance functions Bayes errors Summary GP regression on random graphs: Covariance functions and Bayes errors P Sollich 1 and Camille Coti 1 , 2 1 Kings College London 2 Laboratoire de Recherche en Informatique, Universit e


slide-1
SLIDE 1

Motivation Covariance functions Bayes errors Summary

GP regression on random graphs: Covariance functions and Bayes errors

P Sollich1 and Camille Coti1,2

1King’s College London 2Laboratoire de Recherche en Informatique,

Universit´ e Paris-Sud

Peter Sollich & Camille Coti GP regression on random graphs

slide-2
SLIDE 2

Motivation Covariance functions Bayes errors Summary

Outline

1

Motivation

2

Covariance functions on graphs Definition from graph Laplacian Analysis on regular graphs: tree approximation Effect of loops

3

Bayes errors and learning curves Approximations Effect of loops Effect of kernel parameters

4

Summary and outlook

Peter Sollich & Camille Coti GP regression on random graphs

slide-3
SLIDE 3

Motivation Covariance functions Bayes errors Summary

Motivation

GP regression over continuous spaces relatively well understood [e.g. Opper & Malzahn] Discrete spaces occur in many applications: sequences, strings etc What can we say about GP learning on these? Focus on random graphs with finite connectivity as a paradigmatic case

Peter Sollich & Camille Coti GP regression on random graphs

slide-4
SLIDE 4

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

Outline

1

Motivation

2

Covariance functions on graphs Definition from graph Laplacian Analysis on regular graphs: tree approximation Effect of loops

3

Bayes errors and learning curves Approximations Effect of loops Effect of kernel parameters

4

Summary and outlook

Peter Sollich & Camille Coti GP regression on random graphs

slide-5
SLIDE 5

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

Graph Laplacian

Easiest to define from graph Laplacian [Smola & Kondor 2003] Adjacency matrix Aij = 0 or 1 depending on whether nodes i and j are connected For a graph with V nodes, A is a V × V matrix Consider undirected links (Aij = Aji), and no self-loops (Aii = 0) Degree of node i: di = V

j=1 Aij

Set D = diag(d1, . . . , dV ); then graph Laplacian is def’d as L = 1 − D−1/2AD−1/2 Spectral graph theory: L has eigenvalues in 0 . . . 2

Peter Sollich & Camille Coti GP regression on random graphs

slide-6
SLIDE 6

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

Graph covariance functions

Definition

From graph Laplacian, can define covariance “functions”

(really V × V matrices)

Random walk kernel, a ≥ 2: C ∝ (a − L)p ∝

  • (a − 1) 1 + D−1/2AD−1/2P

Diffusion kernel: C ∝ exp

  • −σ2

2 L

  • ∝ exp

σ2 2 D−1/2AD−1/2

  • Useful to normalize so that (1/V )

i Cii = 1

Peter Sollich & Camille Coti GP regression on random graphs

slide-7
SLIDE 7

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

Graph covariance functions

Interpretation

Random walk on graph has transition probability matrix Aijd−1

j

for transition j → i After s steps, get (AD−1)s = D1/2(D−1/2AD−1/2)sD−1/2 Compare this with C ∝

p

  • s=0

( p

s)(1/a)s(1 − 1/a)p−s(D−1/2AD−1/2)s

So D1/2CD−1/2 is a random walk transition matrix, averaged over distribution of number of steps: s ∼ Binomial(p,1/a)

  • r

s ∼ Poisson(σ2/2) Diffusion kernel is limit p, a → ∞ at constant p/a = σ2/2

Peter Sollich & Camille Coti GP regression on random graphs

slide-8
SLIDE 8

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

Outline

1

Motivation

2

Covariance functions on graphs Definition from graph Laplacian Analysis on regular graphs: tree approximation Effect of loops

3

Bayes errors and learning curves Approximations Effect of loops Effect of kernel parameters

4

Summary and outlook

Peter Sollich & Camille Coti GP regression on random graphs

slide-9
SLIDE 9

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

Random regular graphs

Regular graphs: Every node has same degree d Random graph ensemble: all graphs with given V and d are assigned the same probability Typical loops are then long (∝ ln V ) if V is large So locally these graphs are tree-like How do graph covariance functions then behave? Expect that after many random walk steps (p → ∞), kernel becomes uniform: Cij = 1, all nodes fully correlated

Peter Sollich & Camille Coti GP regression on random graphs

slide-10
SLIDE 10

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

Covariance functions on regular trees

On regular trees, all nodes are equivalent

(except for boundary effects)

So kernel Cij is a function only of distance ℓ measured along the graph (number of links between i and j) Can calculate recursively over p: Cℓ,p=0 = δℓ,0 and C0,p+1 =

  • 1 − 1

a

  • C0,p + d

ad C1,p Cℓ,p+1 = 1 ad Cℓ−1,p +

  • 1 − 1

a

  • Cℓ,p + d − 1

ad Cℓ+1,p Normalize afterwards for each p so that C0,p = 1 Let’s see what happens for d = 3, a = 2 and increasing p

Peter Sollich & Camille Coti GP regression on random graphs

slide-11
SLIDE 11

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

Effect of increasing p

5 10 15

l

0.2 0.4 0.6 0.8 1

Kl

p=1 p=2 p=3 p=4 p=5 p=10 p=20 p=50 p=100 p=200 p=500 p=infty

a=2, d=3

Kernel does not become uniform even for p → ∞

Peter Sollich & Camille Coti GP regression on random graphs

slide-12
SLIDE 12

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

What is going on?

Mapping to biased random walk

Gather all the (equal) random walk probabilities over the shell

  • f nodes at distance ℓ:

S0,p = C0,p, Sℓ,p = d(d − 1)ℓ−1Cℓ,p Then recursion Sℓ,p → Sℓ,p+1 represents a biased random walk in one dimension, with reflecting barrier at origin: 1 − 1

a 1 a

1 − 1

a d−1 ad

1 − 1

a d−1 ad

1 − 1

a

1

← −

2

← −

3

1 ad 1 ad 1 ad

Peter Sollich & Camille Coti GP regression on random graphs

slide-13
SLIDE 13

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

Random walk propagation

Plots of ln Sℓ,p versus ℓ for d = 3, a = 2

500 1000 1500

l

  • 150
  • 100
  • 50

Sl p=5000 p=2000 1000 500

ℓ → ℓ + 1 with prob. (d − 1)/(ad), ℓ → ℓ − 1 with prob. 1/(ad), so Sℓ,p has peak at ℓ = (p/a)[(d − 2)/d]

Peter Sollich & Camille Coti GP regression on random graphs

slide-14
SLIDE 14

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

Converting back to Cℓ,p ∝ Sℓ,p/(d − 1)ℓ−1

100 200 300 400

l

  • 150
  • 100
  • 50

Sl

100 200 300 400

l

  • 150
  • 100
  • 50

Kl

10

  • 3
  • 2
  • 1

p=5000 500 100 p=5000 2000 2000

Covariance function determined by tail of Sℓ,p near origin Can be used to calculate Cℓ,p→∞ = [1 + ℓ(d − 1)/d](d − 1)−ℓ/2

Peter Sollich & Camille Coti GP regression on random graphs

slide-15
SLIDE 15

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

Outline

1

Motivation

2

Covariance functions on graphs Definition from graph Laplacian Analysis on regular graphs: tree approximation Effect of loops

3

Bayes errors and learning curves Approximations Effect of loops Effect of kernel parameters

4

Summary and outlook

Peter Sollich & Camille Coti GP regression on random graphs

slide-16
SLIDE 16

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

Effect of loops

Eventually, approximation of ignoring loops must fail Estimate when this happens: tree of depth ℓ has V = 1 + d(d − 1)ℓ−1 nodes So a regular graph can be tree-like at most out to ℓ ≈ ln(V )/ ln(d − 1) Random walk on graph typically takes p/a steps, so expect loop effects to appear in covariance function around p a ≈ ln(V ) ln(d − 1) Check by measuring average of K1 = Cij/

  • CiiCjj

(i, j nearest neighbours) on randomly generated graphs

Peter Sollich & Camille Coti GP regression on random graphs

slide-17
SLIDE 17

Motivation Covariance functions Bayes errors Summary Definition Analysis Effect of loops

Covariance function for neighbouring nodes

1 10 100 1000

p/a

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

K1 ln V / ln(d-1) a=2, V=infty a=2, V=500 a=4, V=infty a=4, V=500 d=3

K1 starts to get larger than for tree approximation (V → ∞) Results depend only on p/a for large p as expected

Peter Sollich & Camille Coti GP regression on random graphs

slide-18
SLIDE 18

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Outline

1

Motivation

2

Covariance functions on graphs Definition from graph Laplacian Analysis on regular graphs: tree approximation Effect of loops

3

Bayes errors and learning curves Approximations Effect of loops Effect of kernel parameters

4

Summary and outlook

Peter Sollich & Camille Coti GP regression on random graphs

slide-19
SLIDE 19

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Bayes errors and learning curves

Generalization error ǫ of GP regression can be expressed in terms of covariance function for any given dataset Assume we have the correct prior (matched case) Then ǫ is the Bayes error (loss = squared difference) Average over datasets of given size n gives learning curve ǫ(n) Take distribution of inputs to be uniform across graph How does this depend on n, V , d(=3 here), a, p, and noise variance σ2?

Peter Sollich & Camille Coti GP regression on random graphs

slide-20
SLIDE 20

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Some simulation results for orientation

0.1 1 10

ν = n / V

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10

ε σ

2 = 0.1

σ

2 = 0.01

σ

2 = 0.001

σ

2 = 0.0001

V=500, d=3, a=2, p=10

Two different regimes: ǫ > σ2 and ǫ < σ2

Peter Sollich & Camille Coti GP regression on random graphs

slide-21
SLIDE 21

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Theory: Learning curve approximation

Approximations for the learning curve are based on kernel eigenvalues Cijφj = λφi where . . . is over input distribution across nodes Try simple but often accurate approximation ǫ = g

  • n

ǫ + σ2

  • ,

g(h) =

V

  • µ=1

(λ−1

µ + h)−1

Has to be solved self-consistently; note that g(0) =

µ λµ = Cjj = 1

Peter Sollich & Camille Coti GP regression on random graphs

slide-22
SLIDE 22

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Theory: Limit of large V

For large V , tree approximation should be accurate Tree Laplacian eigenvalue density is known: ρL(λ) =

  • 4(d−1)

d2

− (λ − 1)2 2πdλ(2 − λ) Eigenvalues of covariance function are then ∝ V −1(a − λ)p Use this to evaluate approximate learning curves; they depend

  • n n and V only through ν = n/V

Peter Sollich & Camille Coti GP regression on random graphs

slide-23
SLIDE 23

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Eigenvalue spectra

0.5 1 1.5 2

λ

0.2 0.4 0.6 0.8

V=infty V=2000 Eigenvalue spectrum for d=3, a=2, p=1 (2-L)

Tree approximation quite accurate

Peter Sollich & Camille Coti GP regression on random graphs

slide-24
SLIDE 24

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Comparison with simulations

0.1 1 10

ν = n / V

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10

ε σ

2 = 0.1

σ

2 = 0.01

σ

2 = 0.001

σ

2 = 0.0001

σ

2 = 0

V=500, d=3, a=2, p=10

Accurate initially and for ǫ < σ2, less so in crossover

Peter Sollich & Camille Coti GP regression on random graphs

slide-25
SLIDE 25

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Scaling with n/V

0.1 1 10

ν = n / V

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10

ε σ

2 = 0.1

σ

2 = 0.01

σ

2 = 0.001

σ

2 = 0.0001

σ

2 = 0

V=500 (filled) & 1000 (empty), d=3, a=2, p=10

Works well throughout

Peter Sollich & Camille Coti GP regression on random graphs

slide-26
SLIDE 26

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Outline

1

Motivation

2

Covariance functions on graphs Definition from graph Laplacian Analysis on regular graphs: tree approximation Effect of loops

3

Bayes errors and learning curves Approximations Effect of loops Effect of kernel parameters

4

Summary and outlook

Peter Sollich & Camille Coti GP regression on random graphs

slide-27
SLIDE 27

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Effect of loops for large p

Tree approximation must break down as p increases, when loops become important Eventually, when covariance function is uniform, need to learn only one function value so expect ǫ = 1 1 + n/σ2 Consider a case with V = 500, p = 200, a = 2, d = 3 Compare to naive estimate and approximation based on true kernel eigenvalues

Peter Sollich & Camille Coti GP regression on random graphs

slide-28
SLIDE 28

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Simulations vs theory for large p

1 10 100 1000 10000

n

0.0001 0.001 0.01 0.1 1

ε simulation theory 1/(1+n/σ

2)

V=500, d=3, a=2, p=200, σ

2=0.1

Naive estimate poor even though λ1 ≈ 0.994 Theory works well; tail is ∝ (σ2/n) ln(n)

Peter Sollich & Camille Coti GP regression on random graphs

slide-29
SLIDE 29

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Outline

1

Motivation

2

Covariance functions on graphs Definition from graph Laplacian Analysis on regular graphs: tree approximation Effect of loops

3

Bayes errors and learning curves Approximations Effect of loops Effect of kernel parameters

4

Summary and outlook

Peter Sollich & Camille Coti GP regression on random graphs

slide-30
SLIDE 30

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Effect of increasing p

0.1 1 10

ν = n / V

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10

ε σ

2 = 0.1

σ

2 = 0.01

σ

2 = 0.001

σ

2 = 0.0001

σ

2 = 0

V=500, d=3, a=2, p=20

Theory becomes more accurate

Peter Sollich & Camille Coti GP regression on random graphs

slide-31
SLIDE 31

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

(Approximate) predictions for large p

Comparison with simulation shows that theory becomes more accurate For large p, find that learning curve tail (ǫ ≪ σ2) decays as ǫ ∼ cσ2 ν ln3/2 ν cσ2

  • ,

c ∼ (p/a)−3/2 So density ν to reach a certain ǫ decays ∼ c ∼ p−3/2 Even though kernel Cℓ,p at fixed graph distance becomes p-independent for large p, learning still gets faster Presumably an effect of kernel values for large ℓ ∼ p?

Peter Sollich & Camille Coti GP regression on random graphs

slide-32
SLIDE 32

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Effect of increasing a

0.1 1 10

ν = n / V

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10

ε σ

2 = 0.1

σ

2 = 0.01

σ

2 = 0.001

σ

2 = 0.0001

σ

2 = 0

V=500, d=3, a=4, p=10

Theory becomes less accurate

Peter Sollich & Camille Coti GP regression on random graphs

slide-33
SLIDE 33

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Effect of increasing a: Limit a → ∞

Increasing a means typical number of steps in random walk, p/a, decreases Extreme limit a → ∞ gives Cij = δij: all nodes uncorrelated Approximation then predicts ǫ = 1

2(1 − ν − σ2) +

  • 1

4(1 − ν − σ2)2 + σ2

Compare exact result: ǫ =

  • (1 + ni/σ2)−1

, ni = Binomial(n, 1/V ) In low-noise limit σ2 → 0 these become (for large V ) ǫ = 1 − ν vs. ǫ = exp(−ν) so approximation gives an underestimate

Peter Sollich & Camille Coti GP regression on random graphs

slide-34
SLIDE 34

Motivation Covariance functions Bayes errors Summary Approximations Effect of loops Kernel parameters

Limit a → ∞

0.1 1 10 100

ν = n / V

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10

ε Exact Approximation a = ∞, σ

2 = 0.0001

Same “shape” of deviation as before for larger finite a

Peter Sollich & Camille Coti GP regression on random graphs

slide-35
SLIDE 35

Motivation Covariance functions Bayes errors Summary

Summary and outlook

Kernels on graphs have some counter-intuitive properties Function values on different nodes only become fully correlated due to loop effects Nontrivial limiting kernel shape (p → ∞) on regular trees, can be obtained from biased random walk For not-too-large p, learning curves scale with ν = n/V For large p, loops give fully-correlated limit, but with significant corrections Simple approximation works well except for small p/a, in crossover region (ǫ ≈ σ2) Future work: Prior mismatch? Other graph structures? Poisson, small-world, etc?

Peter Sollich & Camille Coti GP regression on random graphs