Random Sampling of Bandlimited Signals on Graphs Pierre - - PowerPoint PPT Presentation

random sampling of bandlimited signals on graphs
SMART_READER_LITE
LIVE PREVIEW

Random Sampling of Bandlimited Signals on Graphs Pierre - - PowerPoint PPT Presentation

Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst cole Polytechnique Fdrale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with Gilles Puy (INRIA), Nicolas


slide-1
SLIDE 1

Random Sampling of Bandlimited Signals on Graphs

Pierre Vandergheynst

NIPS2015 Workshop Multiresolution Methods for Large Scale Learning

1

École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with Gilles Puy (INRIA), Nicolas Tremblay (INRIA) and Rémi Gribonval (INRIA)

slide-2
SLIDE 2

Motivation

2

Point Clouds Energy Networks Transportation Networks Biological Networks Social Networks

slide-3
SLIDE 3

Goal

3

Given partially observed information at the nodes of a graph

?

Can we robustly and efficiently infer missing information ? What signal model ? Influence of the structure of the graph ? How many observations ?

slide-4
SLIDE 4

Notations

4

G = {V, E, W} V is the set of n nodes E is the set of edges W ∈ Rn×n is the weighted adjacency matrix L ∈ Rn×n combinatorial graph Laplacian L := D − W normalised Laplacian L := I − D−1/2WD−1/2 diagonal degree matrix D has entries di := P

i6=j Wij

weighted, undirected

slide-5
SLIDE 5

Notations

5

L is real, symmetric PSD

  • rthonormal eigenvectors U ∈ Rn×n

non-negative eigenvalues λ1 6 λ2 6 . . . , λn L = UΛU|

Graph Fourier Matrix

slide-6
SLIDE 6

Notations

5

L is real, symmetric PSD

  • rthonormal eigenvectors U ∈ Rn×n

non-negative eigenvalues λ1 6 λ2 6 . . . , λn L = UΛU|

Graph Fourier Matrix

k-bandlimited signals x ∈ Rn ˆ x = U|x Fourier coefficients x = Uk ˆ xk ˆ xk ∈ Rk Uk := (u1, . . . , uk) ∈ Rn×k

first k eigenvectors only

slide-7
SLIDE 7

Sampling Model

6

p ∈ Rn pi > 0 kpk1 =

n

X

i=1

pi = 1 P := diag(p) ∈ Rn×n

slide-8
SLIDE 8

Sampling Model

6

p ∈ Rn pi > 0 kpk1 =

n

X

i=1

pi = 1 P := diag(p) ∈ Rn×n P(ωj = i) = pi, ∀j ∈ {1, . . . , m} and ∀i ∈ {1, . . . , n} Draw independently m samples (random sampling)

slide-9
SLIDE 9

Sampling Model

6

p ∈ Rn pi > 0 kpk1 =

n

X

i=1

pi = 1 P := diag(p) ∈ Rn×n P(ωj = i) = pi, ∀j ∈ {1, . . . , m} and ∀i ∈ {1, . . . , n} Draw independently m samples (random sampling) yj := xωj, ∀j ∈ {1, . . . , m} y = Mx

slide-10
SLIDE 10

Sampling Model

7

kU|

kδik2

kU|δik2 = kU|

kδik2

kδik2 = kU|

kδik2

How much a perfect impulse can be concentrated on first k eigenvectors Carries interesting information about the graph

slide-11
SLIDE 11

Sampling Model

7

kU|

kδik2

kU|δik2 = kU|

kδik2

kδik2 = kU|

kδik2

How much a perfect impulse can be concentrated on first k eigenvectors Carries interesting information about the graph

Ideally: pi large wherever kU|

kδik2 is large

slide-12
SLIDE 12

Sampling Model

7

kU|

kδik2

kU|δik2 = kU|

kδik2

kδik2 = kU|

kδik2

How much a perfect impulse can be concentrated on first k eigenvectors Carries interesting information about the graph

Ideally: pi large wherever kU|

kδik2 is large

Graph Coherence νk

p := max 16i6n

n p−1/2

i

kU|

kδik2

  • νk

p >

√ k Rem:

slide-13
SLIDE 13

Stable Embedding

8

Theorem 1 (Restricted isometry property). Let M be a random subsampling matrix with the sampling distribution p. For any , ✏ 2 (0, 1), with probability at least 1 ✏, (1 ) kxk2

2 6 1

m

  • MP−1/2 x
  • 2

2 6 (1 + ) kxk2 2

(1) for all x 2 span(Uk) provided that m > 3 2 (⌫k

p)2 log

✓2k ✏ ◆ . (2)

slide-14
SLIDE 14

Stable Embedding

8

Theorem 1 (Restricted isometry property). Let M be a random subsampling matrix with the sampling distribution p. For any , ✏ 2 (0, 1), with probability at least 1 ✏, (1 ) kxk2

2 6 1

m

  • MP−1/2 x
  • 2

2 6 (1 + ) kxk2 2

(1) for all x 2 span(Uk) provided that m > 3 2 (⌫k

p)2 log

✓2k ✏ ◆ . (2)

MP−1/2 x = P−1/2

Mx

Only need M, re-weighting offline

slide-15
SLIDE 15

Stable Embedding

8

Theorem 1 (Restricted isometry property). Let M be a random subsampling matrix with the sampling distribution p. For any , ✏ 2 (0, 1), with probability at least 1 ✏, (1 ) kxk2

2 6 1

m

  • MP−1/2 x
  • 2

2 6 (1 + ) kxk2 2

(1) for all x 2 span(Uk) provided that m > 3 2 (⌫k

p)2 log

✓2k ✏ ◆ . (2)

MP−1/2 x = P−1/2

Mx

Only need M, re-weighting offline

(νk

p)2 > k

Need to sample at least k nodes

slide-16
SLIDE 16

Stable Embedding

8

Theorem 1 (Restricted isometry property). Let M be a random subsampling matrix with the sampling distribution p. For any , ✏ 2 (0, 1), with probability at least 1 ✏, (1 ) kxk2

2 6 1

m

  • MP−1/2 x
  • 2

2 6 (1 + ) kxk2 2

(1) for all x 2 span(Uk) provided that m > 3 2 (⌫k

p)2 log

✓2k ✏ ◆ . (2)

MP−1/2 x = P−1/2

Mx

Only need M, re-weighting offline

(νk

p)2 > k

Need to sample at least k nodes

Proof similar to CS in bounded ONB but simpler since model is a subspace (not a union)

slide-17
SLIDE 17

Stable Embedding

9

(νk

p)2 > k

Need to sample at least k nodes

slide-18
SLIDE 18

Stable Embedding

9

(νk

p)2 > k

Need to sample at least k nodes Can we reduce to optimal amount ?

slide-19
SLIDE 19

Stable Embedding

9

(νk

p)2 > k

Need to sample at least k nodes Can we reduce to optimal amount ? Variable Density Sampling

p∗

i := kU| kδik2 2

k , i = 1, . . . , n

is such that:

(νk

p)2 = k

and depends on structure of graph

slide-20
SLIDE 20

Stable Embedding

9

(νk

p)2 > k

Need to sample at least k nodes Can we reduce to optimal amount ?

Corollary 1. Let M be a random subsampling matrix constructed with the sam- pling distribution p∗. For any , ✏ 2 (0, 1), with probability at least 1 ✏, (1 ) kxk2

2 6 1

m

  • MP−1/2 x
  • 2

2 6 (1 + ) kxk2 2

for all x 2 span(Uk) provided that m > 3 2 k log ✓2k ✏ ◆ .

Variable Density Sampling

p∗

i := kU| kδik2 2

k , i = 1, . . . , n

is such that:

(νk

p)2 = k

and depends on structure of graph

slide-21
SLIDE 21

Recovery Procedures

10

y = Mx + n x ∈ span(Uk)

y ∈ Rm

stable embedding

slide-22
SLIDE 22

Recovery Procedures

10

y = Mx + n x ∈ span(Uk)

y ∈ Rm

stable embedding

min

z∈span(Uk)

  • P−1/2

(Mz − y)

  • 2

Standard Decoder

slide-23
SLIDE 23

Recovery Procedures

10

y = Mx + n x ∈ span(Uk)

y ∈ Rm

stable embedding

min

z∈span(Uk)

  • P−1/2

(Mz − y)

  • 2

Standard Decoder

need projector

slide-24
SLIDE 24

Recovery Procedures

10

y = Mx + n x ∈ span(Uk)

y ∈ Rm

stable embedding

min

z∈span(Uk)

  • P−1/2

(Mz − y)

  • 2

Standard Decoder

need projector re-weighting for RIP

slide-25
SLIDE 25

Recovery Procedures

11

y = Mx + n x ∈ span(Uk)

y ∈ Rm

stable embedding

slide-26
SLIDE 26

Recovery Procedures

11

y = Mx + n x ∈ span(Uk)

y ∈ Rm

stable embedding

Efficient Decoder: min

z∈Rn

  • P−1/2

(Mz − y)

  • 2

2 + γ z|g(L)z

slide-27
SLIDE 27

Recovery Procedures

11

y = Mx + n x ∈ span(Uk)

y ∈ Rm

stable embedding

Efficient Decoder: min

z∈Rn

  • P−1/2

(Mz − y)

  • 2

2 + γ z|g(L)z

soft constrain on frequencies efficient implementation

slide-28
SLIDE 28

Analysis of Standard Decoder

12

min

z∈span(Uk)

  • P−1/2

(Mz − y)

  • 2

Standard Decoder:

slide-29
SLIDE 29

Analysis of Standard Decoder

12

min

z∈span(Uk)

  • P−1/2

(Mz − y)

  • 2

Standard Decoder:

Theorem 1. Let Ω be a set of m indices selected independently from {1, . . . , n} with sampling distribution p 2 Rn, and M the associated sampling matrix. Let ✏, 2 (0, 1) and m >

3 2 (⌫k p)2 log

2k

  • . With probability at least 1 ✏, the

following holds for all x 2 span(Uk) and all n 2 Rm. i) Let x∗ be the solution of Standard Decoder with y = Mx + n. Then, kx∗ xk2 6 2 p m (1 )

  • P−1/2

n

  • 2 .

(1) ii) There exist particular vectors n0 2 Rm such that the solution x∗ of Stan- dard Decoder with y = Mx + n0 satisfies kx∗ xk2 > 1 p m (1 + )

  • P−1/2

n0

  • 2 .

(2)

slide-30
SLIDE 30

Analysis of Standard Decoder

12

min

z∈span(Uk)

  • P−1/2

(Mz − y)

  • 2

Standard Decoder:

Theorem 1. Let Ω be a set of m indices selected independently from {1, . . . , n} with sampling distribution p 2 Rn, and M the associated sampling matrix. Let ✏, 2 (0, 1) and m >

3 2 (⌫k p)2 log

2k

  • . With probability at least 1 ✏, the

following holds for all x 2 span(Uk) and all n 2 Rm. i) Let x∗ be the solution of Standard Decoder with y = Mx + n. Then, kx∗ xk2 6 2 p m (1 )

  • P−1/2

n

  • 2 .

(1) ii) There exist particular vectors n0 2 Rm such that the solution x∗ of Stan- dard Decoder with y = Mx + n0 satisfies kx∗ xk2 > 1 p m (1 + )

  • P−1/2

n0

  • 2 .

(2) Exact recovery when noiseless

slide-31
SLIDE 31

Analysis of Standard Decoder

12

min

z∈span(Uk)

  • P−1/2

(Mz − y)

  • 2

Standard Decoder:

Theorem 1. Let Ω be a set of m indices selected independently from {1, . . . , n} with sampling distribution p 2 Rn, and M the associated sampling matrix. Let ✏, 2 (0, 1) and m >

3 2 (⌫k p)2 log

2k

  • . With probability at least 1 ✏, the

following holds for all x 2 span(Uk) and all n 2 Rm. i) Let x∗ be the solution of Standard Decoder with y = Mx + n. Then, kx∗ xk2 6 2 p m (1 )

  • P−1/2

n

  • 2 .

(1) ii) There exist particular vectors n0 2 Rm such that the solution x∗ of Stan- dard Decoder with y = Mx + n0 satisfies kx∗ xk2 > 1 p m (1 + )

  • P−1/2

n0

  • 2 .

(2) Exact recovery when noiseless

slide-32
SLIDE 32

Analysis of Efficient Decoder

13

Efficient Decoder: min

z∈Rn

  • P−1/2

(Mz − y)

  • 2

2 + γ z|g(L)z

non-negative

slide-33
SLIDE 33

Analysis of Efficient Decoder

13

Efficient Decoder: min

z∈Rn

  • P−1/2

(Mz − y)

  • 2

2 + γ z|g(L)z

non-negative

h : R → R xh := U diag(ˆ h) U|x ∈ Rn

ˆ h = (h(λ1), . . . , h(λn))| ∈ Rn Filter reshapes Fourier coefficients

slide-34
SLIDE 34

Analysis of Efficient Decoder

13

Efficient Decoder: min

z∈Rn

  • P−1/2

(Mz − y)

  • 2

2 + γ z|g(L)z

p(t) =

d

X

i=0

αi ti xp = U diag(ˆ p) U|x =

d

X

i=0

αi Lix

Pick special polynomials and use e.g. recurrence relations for fast filtering (with sparse matrix-vector multiply only)

non-negative

h : R → R xh := U diag(ˆ h) U|x ∈ Rn

ˆ h = (h(λ1), . . . , h(λn))| ∈ Rn Filter reshapes Fourier coefficients

slide-35
SLIDE 35

Analysis of Efficient Decoder

14

Efficient Decoder: min

z∈Rn

  • P−1/2

(Mz − y)

  • 2

2 + γ z|g(L)z

non-negative

non-decreasing = penalizes high-frequencies

slide-36
SLIDE 36

Analysis of Efficient Decoder

14

Efficient Decoder: min

z∈Rn

  • P−1/2

(Mz − y)

  • 2

2 + γ z|g(L)z

non-negative

non-decreasing = penalizes high-frequencies

Favours reconstruction of approximately band-limited signals iλk(t) := ⇢ 0 if t ∈ [0, λk], +∞

  • therwise,

Ideal filter yields Standard Decoder

slide-37
SLIDE 37

Analysis of Efficient Decoder

15

Theorem 1. Let Ω, M, P, m as before and Mmax > 0 be a constant such that

  • MP−1/2
  • 2 6 Mmax. Let ✏, 2 (0, 1). With probability at least 1 ✏, the

following holds for all x 2 span(Uk), all n 2 Rn, all > 0, and all nonnegative and nondecreasing polynomial functions g such that g(λk+1) > 0. Let x∗ be the solution of Efficient Decoder with y = Mx + n. Then, kα∗ xk2 6 1 p m(1 ) " 2 + Mmax p g(λk+1) !

  • P−1/2

n

  • 2

+ Mmax s g(λk) g(λk+1) + p g(λk) ! kxk2 # , (1) and kβ∗k2 6 1 p g(λk+1)

  • P−1/2

n

  • 2 +

s g(λk) g(λk+1) kxk2 , (2) where α∗ := UkU|

k x∗ and β∗ := (I UkU| k) x∗.

slide-38
SLIDE 38

Analysis of Efficient Decoder

16

g(λk) = 0

Noiseless case:

kx∗ xk2 6 1 p m(1 δ) Mmax s g(λk) g(λk+1) + p γg(λk) ! kxk2 + s g(λk) g(λk+1) kxk2

+ non-decreasing implies perfect reconstruction

slide-39
SLIDE 39

Analysis of Efficient Decoder

16

g(λk) = 0

Noiseless case:

kx∗ xk2 6 1 p m(1 δ) Mmax s g(λk) g(λk+1) + p γg(λk) ! kxk2 + s g(λk) g(λk+1) kxk2

+ non-decreasing implies perfect reconstruction

choose γ as close as possible to 0 and seek to minimise the ratio g(λk)/g(λk+1)

Otherwise:

Choose filter to increase spectral gap ? Clusters are of course good

Noise:

kP−1/2

nk2/ kxk2

slide-40
SLIDE 40

Estimating the Optimal Distribution

17

slide-41
SLIDE 41

Estimating the Optimal Distribution

17

Need to estimate kU|

kδik2 2

slide-42
SLIDE 42

Estimating the Optimal Distribution

17

rbλk = U diag(λ1, . . . , λk, 0, . . . , 0) U| r = UkU|

k r

Filter random signals with ideal low-pass filter: E (rbλk )2

i = δ| i UkU| k E(rr|) UkU| kδi = kU| kδik2 2

Need to estimate kU|

kδik2 2

slide-43
SLIDE 43

Estimating the Optimal Distribution

17

rbλk = U diag(λ1, . . . , λk, 0, . . . , 0) U| r = UkU|

k r

Filter random signals with ideal low-pass filter: E (rbλk )2

i = δ| i UkU| k E(rr|) UkU| kδi = kU| kδik2 2

˜ pi := PL

l=1 (rl cλk )2 i

Pn

i=1

PL

l=1 (rl cλk )2 i

In practice, one may use a polynomial approximation of the ideal filter and:

L > C 2 log ✓2n ✏ ◆ Need to estimate kU|

kδik2 2

slide-44
SLIDE 44

Estimating the Eigengap

18

slide-45
SLIDE 45

Estimating the Eigengap

18

(1 − δ)

n

X

i=1

  • U|

j∗δi

  • 2

2 6 n

X

i=1 L

X

l=1

(rl

bλ)2 i 6 (1 + δ) n

X

i=1

  • U|

j∗δi

  • 2

2

Again, low-pass filtering random signals:

slide-46
SLIDE 46

Estimating the Eigengap

18

(1 − δ)

n

X

i=1

  • U|

j∗δi

  • 2

2 6 n

X

i=1 L

X

l=1

(rl

bλ)2 i 6 (1 + δ) n

X

i=1

  • U|

j∗δi

  • 2

2

Again, low-pass filtering random signals:

n

X

i=1

  • U|

j∗δi

  • 2

2 = kUj∗k2 Frob = j∗

Since: (1 − δ) j∗ 6

n

X

i=1 L

X

l=1

(rl

bλ)2 i 6 (1 + δ) j∗

We have: Dichotomy using the filter bandwidth

slide-47
SLIDE 47

Experiments

19

unbalanced clusters

slide-48
SLIDE 48

Experiments

20

slide-49
SLIDE 49

Experiments

21

slide-50
SLIDE 50

Experiments

22

7%

slide-51
SLIDE 51

Compressive Spectral Clustering

23

Clustering equivalent to recovery of cluster assignment functions Well-defined clusters -> band-limited assignment functions!

slide-52
SLIDE 52

Compressive Spectral Clustering

23

Clustering equivalent to recovery of cluster assignment functions Well-defined clusters -> band-limited assignment functions! Generate features by filtering random signals

by Johnson-Lindenstrauss

⌘ = 4 + 2 ✏2/2 − ✏3/3 log n

slide-53
SLIDE 53

Compressive Spectral Clustering

24

Clustering equivalent to recovery of cluster assignment functions Well-defined clusters -> band-limited assignment functions! Generate features by filtering random signals

by Johnson-Lindenstrauss

⌘ = 4 + 2 ✏2/2 − ✏3/3 log n Use k-means on compressed data and feed into Efficient Decoder Each feature map is smooth, therefore keep m > 6 2 ⌫2

k log

✓ k ✏0 ◆

slide-54
SLIDE 54

Compressive Spectral Clustering

25

k log k

log k

slide-55
SLIDE 55

Conclusion

26

slide-56
SLIDE 56

Conclusion

  • Stable, robust and universal random sampling of smoothly varying

information on graphs.

26

slide-57
SLIDE 57

Conclusion

  • Stable, robust and universal random sampling of smoothly varying

information on graphs.

26

  • Tractable decoder with guarantees
slide-58
SLIDE 58

Conclusion

  • Stable, robust and universal random sampling of smoothly varying

information on graphs.

26

  • Tractable decoder with guarantees
  • Optimal sampling distribution depends on graph structure
slide-59
SLIDE 59

Conclusion

  • Stable, robust and universal random sampling of smoothly varying

information on graphs.

26

  • Tractable decoder with guarantees
  • Optimal sampling distribution depends on graph structure
  • Can be used for inference, (SVD less) compressive clustering
slide-60
SLIDE 60

Thank you !

27