Rethinking Sketching as Sampling: Efficient Approximate Solution to - - PowerPoint PPT Presentation

rethinking sketching as sampling efficient approximate
SMART_READER_LITE
LIVE PREVIEW

Rethinking Sketching as Sampling: Efficient Approximate Solution to - - PowerPoint PPT Presentation

Rethinking Sketching as Sampling: Efficient Approximate Solution to Linear Inverse Problems Fernando Gama , A. G. Marques, G. Mateos & A. Ribeiro Dept. of Electrical and Systems Engineering University of Pennsylvania fgama@seas.upenn.edu


slide-1
SLIDE 1

Rethinking Sketching as Sampling: Efficient Approximate Solution to Linear Inverse Problems

Fernando Gama, A. G. Marques, G. Mateos & A. Ribeiro

  • Dept. of Electrical and Systems Engineering

University of Pennsylvania fgama@seas.upenn.edu

GlobalSIP, December 9, 2016

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 1/23

slide-2
SLIDE 2

PCA Classification

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 2/23

slide-3
SLIDE 3

PCA Classification

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

◮ Classify images according to the digits handwritten on them

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 2/23

slide-4
SLIDE 4

PCA Classification

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

◮ Classify images according to the digits handwritten on them ◮ Perform PCA ⇒ Keep first few coefficients ⇒ Apply linear classifier

PCA Linear Classifier Image n = 784 few PCA coefficients k = 20 {0, 1}

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 2/23

slide-5
SLIDE 5

PCA Classification

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

◮ Few PCA coefficients ⇒ Problem is inherently lower-dimensional ◮ Improves classification task ⇒ Low-pass filter to remove noise ◮ Lower-dimensional representation can also save computational cost

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 2/23

slide-6
SLIDE 6

Computational Cost

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

◮ Note that in performing PCA we need the complete image ◮ However, there are pixels that do not contribute to classification

⇒ Pixels on the border of the image, for example

◮ And there are pixels that are more important for classification

⇒ Pixels that are white in one image but black in the other

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 3/23

slide-7
SLIDE 7

Computational Cost

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

◮ Note that in performing PCA we need the complete image ◮ However, there are pixels that do not contribute to classification

⇒ Pixels on the border of the image, for example

◮ And there are pixels that are more important for classification

⇒ Pixels that are white in one image but black in the other

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 3/23

slide-8
SLIDE 8

Computational Cost

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

◮ Note that in performing PCA we need the complete image ◮ However, there are pixels that do not contribute to classification

⇒ Pixels on the border of the image, for example

◮ And there are pixels that are more important for classification

⇒ Pixels that are white in one image but black in the other

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 3/23

slide-9
SLIDE 9

Computational Cost

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

◮ Note that in performing PCA we need the complete image ◮ However, there are pixels that do not contribute to classification

⇒ Pixels on the border of the image, for example

◮ And there are pixels that are more important for classification

⇒ Pixels that are white in one image but black in the other

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 3/23

slide-10
SLIDE 10

Sampling

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

◮ Few nonzero PCA coefficients ⇒ Bandlimited signal ⇒ Sampling ◮ Subspace representation on covariance graph (not all pixels are useful)

⇒ Linear combination of a few eigenvectors weighted by PCA coeff.

◮ Extend to arbitrary graphs ⇒ Sampling of bandlimited graph signals

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 4/23

slide-11
SLIDE 11

Sampling

◮ Few nonzero PCA coefficients ⇒ Bandlimited signal ⇒ Sampling ◮ Subspace representation on covariance graph (not all pixels are useful)

⇒ Linear combination of a few eigenvectors weighted by PCA coeff.

◮ Extend to arbitrary graphs ⇒ Sampling of bandlimited graph signals

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 4/23

slide-12
SLIDE 12

Sampling

◮ Few nonzero PCA coefficients ⇒ Bandlimited signal ⇒ Sampling ◮ Subspace representation on covariance graph (not all pixels are useful)

⇒ Linear combination of a few eigenvectors weighted by PCA coeff.

◮ Extend to arbitrary graphs ⇒ Sampling of bandlimited graph signals ◮ Design a classifier to operate on the samples ⇒ Reduce dimensionality

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 4/23

slide-13
SLIDE 13

Rethinking Sketching as Sampling

◮ Sketching ⇒ Reduce dimensionality of linear transformations ◮ Projection on a lower-dimensional subspace ⇒ Smaller size matrix

⇒ Matrix sketch retains the most outstanding characteristics

◮ Smaller matrix operates on smaller vector to compute the result

⇒ Project vector on a lower-dimensional subspace ⇒ Sampling

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 5/23

slide-14
SLIDE 14

Rethinking Sketching as Sampling

◮ Sketching ⇒ Reduce dimensionality of linear transformations ◮ Projection on a lower-dimensional subspace ⇒ Smaller size matrix

⇒ Matrix sketch retains the most outstanding characteristics

◮ Smaller matrix operates on smaller vector to compute the result

⇒ Project vector on a lower-dimensional subspace ⇒ Sampling

◮ Jointly design sampling of signal and sketching of linear transform

⇒ Obtain approximate solution by operating only on few samples

H

m × n

Hs

m × p

C

p × n

+ y

m

ˆ y

m

w

n

x

n

H

n × m

C

p × n

Hs

m × p

+ y

m

x

n

ˆ y

m

w

n Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 5/23

slide-15
SLIDE 15

Sampling of Graph Signals

◮ Graph signals defined on top of a graph G = (V, E, W) with n nodes ◮ Irregular support captured by normal graph shift operator S = VΛVH ◮ Define the graph Fourier transform (GFT) ˜

x = VHx ⇒ Linear combination weighted by GFT coefficients x = V˜ x (iGFT)

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 6/23

slide-16
SLIDE 16

Sampling of Graph Signals

◮ Graph signals defined on top of a graph G = (V, E, W) with n nodes ◮ Irregular support captured by normal graph shift operator S = VΛVH ◮ Define the graph Fourier transform (GFT) ˜

x = VHx ⇒ Linear combination weighted by GFT coefficients x = V˜ x (iGFT)

◮ Bandlimited graph signal ⇒ ˜

x = [˜ xk; 0n−k] with k ≪ n ⇒ x = Vk˜ xk ⇒ Active eigenbasis of vectors Vk = [Vk, 0n×(n−k)]

◮ Signal as a linear combination of few elements in Vk ⇒ Sampling

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 6/23

slide-17
SLIDE 17

Sketching

◮ Estimate the input to a linear transform by measuring the output

⇒ The model is x = Hy, with H ∈ Rn×m and where n ≫ m ⇒ LS solution ⇒ Computationally costly (pseudo-)inverse

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 7/23

slide-18
SLIDE 18

Sketching

◮ Estimate the input to a linear transform by measuring the output

⇒ The model is x = Hy, with H ∈ Rn×m and where n ≫ m ⇒ LS solution ⇒ Computationally costly (pseudo-)inverse

◮ Traditional sketching ⇒ Reduce dimension of the linear problem ◮ Compress H and x ⇒ KH and Kx, K ∈ Rp×n random, p ≪ n

⇒ Random projection on a lower-dimensional subspace ⇒ Solution of smaller problem miny (KH)y − (Kx)2

2 ⇒ Faster ◮ Design K such that KH and Kx retains important traits of the problem

⇒ Then, solving for (KH, Kx) yields a good approximation

◮ We consider a deterministic design to obtain a smaller matrix sketch

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 7/23

slide-19
SLIDE 19

Operating Conditions

◮ Sequence of signals to be processed by the same linear transform

⇒ Matrix H is big ⇒ Computationally intensive to operate with

◮ Realizations of a bandlimited graph random process ⇒ Rx singular ◮ Enough computational power available prior to processing of signals ◮ Process sequence of signals fast ⇒ Apply smaller matrix to samples ◮ Traditional sampling ⇒ Ignores further processing on the signal ◮ Traditional sketching ⇒ Recomputes sketch for each realization x

Hs

m × p

C

p × n

x1 x2

n × 1

x3 Cx1 Cx2

p × 1

Cx3 ˆ y3 ˆ y2

m × 1

ˆ y1 Design C, Hs based on H and statistics of signal Rx and noise Rw

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 8/23

slide-20
SLIDE 20

Problem Statement

◮ Design a sampling matrix C that selects k ≤ p ≪ n samples ◮ Design a deterministic sketch Hs to be directly applied to samples ◮ Joint design of sketching and sampling prior to start of sequence

⇒ Minimize the MSE relative to using full H on the full signal x

◮ Processing of signals reduces to sampling and matrix multiplication ◮ The computational cost of processing is reduced by a factor of p/n

H

n × m

C

p × n

Hs

m × p

+ y

m

x

n

ˆ y

m

w

n

H

m × n

Hs

m × p

C

p × n

+ y

m

ˆ y

m

w

n

x

n Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 9/23

slide-21
SLIDE 21

Inverse Linear Problem

◮ Use noisy output (x + w) ∈ Rn to estimate input y ∈ Rm, x = Hy ◮ Linear model H ∈ Rn×m tall matrix with m ≪ n and full rank ◮ Output signal x ∈ Rn is k-bandlimited with known Rx 0 (singular) ◮ Input noise w, indep. of x with known covariance matrix Rw ≻ 0 ◮ Design sketch H∗ s ∈ Rm×p and a selection matrix C∗ ∈ Rp×n

{C∗, H∗

s } := argmin C∈Cpn,Hs

E

  • HHsC(x + w) − x2

2

  • ◮ Solve this problem before processing the sequence of signals

H

n × m

C

p × n

Hs

m × p

+ y

m

x

n

ˆ y

m

w

n Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 10/23

slide-22
SLIDE 22

Inverse Linear Problem: Noisy case

◮ Two-stage optimization to solve min E

  • HHsC(x + w) − x2

2

  • 1. Design matrix sketch H∗

s = H∗ s (C) then replace on objective function

H∗

s (C) = ALSRxCT

C(Rx + Rw)CT−1 ⇒ This is the LS solution with a preprocessing to deal with the noise

  • 2. Define auxiliary matrix G = HALS and obtain C∗ by solving

min

C∈Cpn tr

  • Rx − GRxCT

C(Rx + Rw)CT−1 CRxGT ⇒ Tradeoff between output energy and noise of the selected samples ⇒ This is a binary optimization problem over selection matrix C ⇒ There are n

p

  • possible solutions ⇒ Prohibitive to test all of them

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 11/23

slide-23
SLIDE 23

Inverse Linear Problem: Noisy case

◮ Binary constraints are inherent to the selection problem ◮ Equivalent problem with linear objective function and LMIs

⇒ It would be an SDP except for binary constraint

◮ Observe that CTC = diag(c) ⇒ Sampling vector c ∈ {0, 1}n ◮ Define ¯

Cα = diag(c)/α, α > 0 and ¯ Rα = Rx + Rw − αIn

◮ Problem over C can be posed as an equivalent problem over c

min

c∈{0,1,}n,Y,¯ Cα

tr [Y]

  • s. t. ¯

Cα = α−1diag(c) , cT1n = p Y − Rx + GRx ¯ CαRxGT GRx ¯ Cα ¯ CαRxGT ¯ R−1

α + ¯

  • ◮ This is also a complicated problem but slightly more tractable

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 12/23

slide-24
SLIDE 24

Direct Linear Problem

◮ Given noisy input (x + w) ∈ Rn estimate output y = Hx, y ∈ Rm ◮ Design sketch Hs ∈ Rm×p and p × n selection matrix C

{C∗, H∗

s } := argmin C∈Cpn,Hs

E

  • HsC(x + w) − Hx2

2

  • ◮ Two stage optimization ⇒ Matrix sketch Hs and sampling scheme C

◮ Can be reformulated as an equivalent problem over selection vector c

⇒ Linear objective function, LMIs constraints, binary constraint

H

m × n

Hs

m × p

C

p × n

+ y

m

ˆ y

m

w

n

x

n Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 13/23

slide-25
SLIDE 25

Sampling Heuristics

◮ Solving sampling problems might be intractable ⇒ Heuristic solutions ◮ Convex relaxation [O((n + m)3.5)] ⇒ c ∈ [0, 1]n ⇒ SDPs

⇒ Tresholding ⇒ Set the p highest values to 1 and the rest to 0 ⇒ Random ⇒ Use relaxed solution as distribution to select nodes

◮ Noise-blind Heuristic [O(n log n)] ⇒ p rows of RxGT with largest · 2

min

C∈Cpn tr

  • Rx − GRxCT

C(Rx + Rw)CT−1 CRxGT

◮ Greedy approach [O(np(mnp + p3))] ⇒ Select best node iteratively

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 14/23

slide-26
SLIDE 26

Example: Computing the Graph Fourier Transform

◮ Consider a bandlimited graph signal x = Vk˜

xk ⇒ ˜ xk: freq. coeff. ⇒ Inverse linear model ⇒ x = Vk˜ xk ⇒ Transform H = Vk

◮ Sequence of noisy signals (x + w) ⇒ Fast computation of the GFT

⇒ w: white gaussian zero-mean noise of power prop. to energy of x

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 15/23

slide-27
SLIDE 27

Example: Computing the Graph Fourier Transform

◮ Consider a bandlimited graph signal x = Vk˜

xk ⇒ ˜ xk: freq. coeff. ⇒ Inverse linear model ⇒ x = Vk˜ xk ⇒ Transform H = Vk

◮ Sequence of noisy signals (x + w) ⇒ Fast computation of the GFT

⇒ w: white gaussian zero-mean noise of power prop. to energy of x

◮ Compare between different heuristics proposed for the joint design ◮ Compare with other traditional sampling schemes for reconstruction

⇒ Experimentally Design Sampling (EDS) technique ⇒ Assign to each node the norm of the rows of Vk ⇒ Sample with replacement with a distribution prop. to this norm

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 15/23

slide-28
SLIDE 28

Example: Approximating the GFT

◮ Erd˝

  • s-R´

enyi graph of size n = 100 with probablity 0.2

◮ Signal bandlimited with k = 10 freq. coeff. ⇒ p = k = 10

10 -5 10 -4 10 -3 σ2

coeff

10 -5 10 -4 10 -3 10 -2

Relative MSE

EDS norm-1 EDS norm-2 EDS norm-∞ Conv Relaxation Random Conv Relaxation Tresholding Noise-Blind Heuristic Greedy

◮ Error of 2 · 10−5 reducing computational complexity by 10

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 16/23

slide-29
SLIDE 29

Example: Approximating the GFT

◮ Erd˝

  • s-R´

enyi graph of size n = 100 with probablity 0.2

◮ Signal bandlimited with k = 10 freq. coeff. ⇒ σ2 coeff = 10−4

6 8 10 12 14 16 18 20 22 24 p 10 -5 10 -4 10 -3 10 -2 Relative MSE

EDS norm-1 EDS norm-2 EDS norm-∞ Conv Relaxation Random Conv Relaxation Tresholding Noise-Blind Heuristic Greedy

◮ Error of 10−4 reducing computational complexity by 100/24 = 4.167

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 17/23

slide-30
SLIDE 30

Classification of handwritten digits

◮ Classify images of handwritten digits of the MNIST database ◮ Linear classifier in the PCA domain ⇒ Expensive linear operation

⇒ Subsume PCA and classifier in one linear operator

PCA Linear Classifier Image n = 784 few PCA coefficients k = 20 {0, 1}

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 18/23

slide-31
SLIDE 31

Classification of handwritten digits

◮ Classify images of handwritten digits of the MNIST database ◮ Linear classifier in the PCA domain ⇒ Expensive linear operation

⇒ Subsume PCA and classifier in one linear operator

H = PCA+Classif. x (Image) n = 784 y

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 18/23

slide-32
SLIDE 32

Classification of handwritten digits

◮ Classify images of handwritten digits of the MNIST database ◮ Linear classifier in the PCA domain ⇒ Expensive linear operation

⇒ Subsume PCA and classifier in one linear operator

H = PCA+Classif. x (Image) n = 784 y

◮ Classify images by operating directly on a subset of pixels ◮ Images of size n = 784 pixels ⇒ Use only p = 20 pixels

⇒ Processing costs reduced by 39.2 for each image

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 18/23

slide-33
SLIDE 33

Classification of handwritten digits

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

(a) EDS norm-1

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

(b) EDS norm-2

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

(c) EDS norm-∞

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

(d) Tresholding

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

(e) Noise-Blind

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

(f) Greedy

◮ Sketching and sampling techniques achieve perfect classification

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 19/23

slide-34
SLIDE 34

Classification of handwritten digits

(a) EDS norm-1 (b) EDS norm-2 (c) EDS norm-∞ (d) Tresholding (e) Noise-Blind (f) Greedy

◮ Error rate using full image: 4.00%

⇒ Greedy approach using 20 pixels: 4.53%

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 20/23

slide-35
SLIDE 35

Classification of handwritten digits

◮ 200 image classification as a function of noise for p = 20 pixels

10 -5 10 -4 10 -3 σ2

coeff

1 2 3 4 5

Errors

EDS norm-1 Noise-Blind Heuristic EDS norm-2 EDS norm-∞ Conv Relaxation Random Conv Relaxation Tresholding Greedy Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 21/23

slide-36
SLIDE 36

Classification of handwritten digits

◮ 200 image classification as a function of the number of pixels

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

p

1 2 3 4

Errors

EDS norm-1 Noise-Blind Heuristic EDS norm-2 EDS norm-∞ Conv Relaxation Random Conv Relaxation Tresholding Greedy Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 22/23

slide-37
SLIDE 37

Conclusions

◮ Optimal sketch and sampling for processing bandlimited graph signals

⇒ Obtain approximate solution by operating only on a few samples ⇒ Accelerate processing of a sequence of bandlimited signals

◮ Joint design of matrix sketch and sampling scheme (prior to processing)

⇒ Two-stage optimization ⇒ Heuristic solutions for sampling problem

◮ Fast computation of GFT of a bandlimited graph signal

⇒ Errors in the order of 10−5 reducing the cost 10 times

◮ Classification of images of size 784 pixels of handwritten digits

⇒ Using as few as 20 pixels ⇒ 40 times less computational cost

◮ Journal version available on arXiv: arxiv.org/abs/1611.00119

Gama, Marques, Mateos, Ribeiro Rethinking Sketching as Sampling 23/23

slide-38
SLIDE 38

PCA Classification

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

◮ Classify images according to the digits handwritten on them ◮ Perform PCA ⇒ Keep first few coefficients ⇒ Apply linear classifier

PCA Linear Classifier Image n = 784 few PCA coefficients k = 20 {0, 1}

1/22

slide-39
SLIDE 39

PCA Classification

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

◮ Few PCA coefficients ⇒ Problem is inherently lower-dimensional ◮ Improves classification task ⇒ Low-pass filter to remove noise ◮ Lower-dimensional representation can also save computational cost

1/22

slide-40
SLIDE 40

Computational Cost

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

◮ Note that in performing PCA we need the complete image ◮ However, there are pixels that do not contribute to classification

⇒ Pixels on the border of the image, for example

◮ And there are pixels that are more important for classification

⇒ Pixels that are white in one image but black in the other

2/22

slide-41
SLIDE 41

Sampling

◮ Few nonzero PCA coefficients ⇒ Bandlimited signal ⇒ Sampling ◮ Subspace representation on covariance graph (not all pixels are useful)

⇒ Linear combination of a few eigenvectors weighted by PCA coeff.

◮ Extend to arbitrary graphs ⇒ Sampling of bandlimited graph signals ◮ Design a classifier to operate on the samples ⇒ Reduce dimensionality

3/22

slide-42
SLIDE 42

Rethinking Sketching as Sampling

◮ Sketching ⇒ Reduce dimensionality of linear transformations ◮ Projection on a lower-dimensional subspace ⇒ Smaller size matrix

⇒ Matrix sketch retains the most outstanding characteristics

◮ Smaller matrix operates on smaller vector to compute the result

⇒ Project vector on a lower-dimensional subspace ⇒ Sampling

◮ Jointly design sampling of signal and sketching of linear transform

⇒ Obtain approximate solution by operating only on few samples

H

m × n

Hs

m × p

C

p × n

+ y

m

ˆ y

m

w

n

x

n

H

n × m

C

p × n

Hs

m × p

+ y

m

x

n

ˆ y

m

w

n 4/22

slide-43
SLIDE 43

Sampling of Graph Signals

◮ Graph signals defined on top of a graph G = (V, E, W) with n nodes ◮ Irregular support captured by normal graph shift operator S = VΛVH ◮ Define the graph Fourier transform (GFT) ˜

x = VHx ⇒ Linear combination weighted by GFT coefficients x = V˜ x (iGFT)

◮ Bandlimited graph signal ⇒ ˜

x = [˜ xk; 0n−k] with k ≪ n ⇒ x = Vk˜ xk ⇒ Active eigenbasis of vectors Vk = [Vk, 0n×(n−k)]

◮ Signal as a linear combination of few elements in Vk ⇒ Sampling

5/22

slide-44
SLIDE 44

Sketching

◮ Estimate the input to a linear transform by measuring the output

⇒ The model is x = Hy, with H ∈ Rn×m and where n ≫ m ⇒ LS solution ⇒ Computationally costly (pseudo-)inverse

◮ Traditional sketching ⇒ Reduce dimension of the linear problem ◮ Compress H and x ⇒ KH and Kx, K ∈ Rp×n random, p ≪ n

⇒ Random projection on a lower-dimensional subspace ⇒ Solution of smaller problem miny (KH)y − (Kx)2

2 ⇒ Faster ◮ Design K such that KH and Kx retains important traits of the problem

⇒ Then, solving for (KH, Kx) yields a good approximation

◮ We consider a deterministic design to obtain a smaller matrix sketch

6/22

slide-45
SLIDE 45

Operating Conditions

◮ Sequence of signals to be processed by the same linear transform

⇒ Matrix H is big ⇒ Computationally intensive to operate with

◮ Realizations of a bandlimited graph random process ⇒ Rx singular ◮ Enough computational power available prior to processing of signals ◮ Process sequence of signals fast ⇒ Apply smaller matrix to samples ◮ Traditional sampling ⇒ Ignores further processing on the signal ◮ Traditional sketching ⇒ Recomputes sketch for each realization x

Hs

m × p

C

p × n

x1 x2

n × 1

x3 Cx1 Cx2

p × 1

Cx3 ˆ y3 ˆ y2

m × 1

ˆ y1 Design C, Hs based on H and statistics of signal Rx and noise Rw

7/22

slide-46
SLIDE 46

Problem Statement

◮ Design a sampling matrix C that selects k ≤ p ≪ n samples ◮ Design a deterministic sketch Hs to be directly applied to samples ◮ Joint design of sketching and sampling prior to start of sequence

⇒ Minimize the MSE relative to using full H on the full signal x

◮ Processing of signals reduces to sampling and matrix multiplication ◮ The computational cost of processing is reduced by a factor of p/n

H

n × m

C

p × n

Hs

m × p

+ y

m

x

n

ˆ y

m

w

n

H

m × n

Hs

m × p

C

p × n

+ y

m

ˆ y

m

w

n

x

n 8/22

slide-47
SLIDE 47

Inverse Linear Problem

◮ Use noisy output (x + w) ∈ Rn to estimate input y ∈ Rm, x = Hy ◮ Linear model H ∈ Rn×m tall matrix with m ≪ n and full rank ◮ Output signal x ∈ Rn is k-bandlimited with known Rx 0 (singular) ◮ Input noise w, indep. of x with known covariance matrix Rw ≻ 0 ◮ Design sketch H∗ s ∈ Rm×p and a selection matrix C∗ ∈ Rp×n

{C∗, H∗

s } := argmin C∈Cpn,Hs

E

  • HHsC(x + w) − x2

2

  • ◮ Solve this problem before processing the sequence of signals

H

n × m

C

p × n

Hs

m × p

+ y

m

x

n

ˆ y

m

w

n 9/22

slide-48
SLIDE 48

Inverse Linear Problem: Noisy case

◮ Two-stage optimization to solve min E

  • HHsC(x + w) − x2

2

  • 1. Design matrix sketch H∗

s = H∗ s (C) then replace on objective function

H∗

s (C) = ALSRxCT

C(Rx + Rw)CT−1 ⇒ This is the LS solution with a preprocessing to deal with the noise

  • 2. Define auxiliary matrix G = HALS and obtain C∗ by solving

min

C∈Cpn tr

  • Rx − GRxCT

C(Rx + Rw)CT−1 CRxGT ⇒ Tradeoff between output energy and noise of the selected samples ⇒ This is a binary optimization problem over selection matrix C ⇒ There are n

p

  • possible solutions ⇒ Prohibitive to test all of them

10/22

slide-49
SLIDE 49

Inverse Linear Problem: Noisy case

◮ Binary constraints are inherent to the selection problem ◮ Equivalent problem with linear objective function and LMIs

⇒ It would be an SDP except for binary constraint

◮ Observe that CTC = diag(c) ⇒ Sampling vector c ∈ {0, 1}n ◮ Define ¯

Cα = diag(c)/α, α > 0 and ¯ Rα = Rx + Rw − αIn

◮ Problem over C can be posed as an equivalent problem over c

min

c∈{0,1,}n,Y,¯ Cα

tr [Y]

  • s. t. ¯

Cα = α−1diag(c) , cT1n = p Y − Rx + GRx ¯ CαRxGT GRx ¯ Cα ¯ CαRxGT ¯ R−1

α + ¯

  • ◮ This is also a complicated problem but slightly more tractable

11/22

slide-50
SLIDE 50

Direct Linear Problem

◮ Given noisy input (x + w) ∈ Rn estimate output y = Hx, y ∈ Rm ◮ Design sketch Hs ∈ Rm×p and p × n selection matrix C

{C∗, H∗

s } := argmin C∈Cpn,Hs

E

  • HsC(x + w) − Hx2

2

  • ◮ Two stage optimization ⇒ Matrix sketch Hs and sampling scheme C

◮ Can be reformulated as an equivalent problem over selection vector c

⇒ Linear objective function, LMIs constraints, binary constraint

H

m × n

Hs

m × p

C

p × n

+ y

m

ˆ y

m

w

n

x

n 12/22

slide-51
SLIDE 51

Sampling Heuristics

◮ Solving sampling problems might be intractable ⇒ Heuristic solutions ◮ Convex relaxation [O((n + m)3.5)] ⇒ c ∈ [0, 1]n ⇒ SDPs

⇒ Tresholding ⇒ Set the p highest values to 1 and the rest to 0 ⇒ Random ⇒ Use relaxed solution as distribution to select nodes

◮ Noise-blind Heuristic [O(n log n)] ⇒ p rows of RxGT with largest · 2

min

C∈Cpn tr

  • Rx − GRxCT

C(Rx + Rw)CT−1 CRxGT

◮ Greedy approach [O(np(mnp + p3))] ⇒ Select best node iteratively

13/22

slide-52
SLIDE 52

Example: Computing the Graph Fourier Transform

◮ Consider a bandlimited graph signal x = Vk˜

xk ⇒ ˜ xk: freq. coeff. ⇒ Inverse linear model ⇒ x = Vk˜ xk ⇒ Transform H = Vk

◮ Sequence of noisy signals (x + w) ⇒ Fast computation of the GFT

⇒ w: white gaussian zero-mean noise of power prop. to energy of x

◮ Compare between different heuristics proposed for the joint design ◮ Compare with other traditional sampling schemes for reconstruction

⇒ Experimentally Design Sampling (EDS) technique ⇒ Assign to each node the norm of the rows of Vk ⇒ Sample with replacement with a distribution prop. to this norm

14/22

slide-53
SLIDE 53

Example: Approximating the GFT

◮ Erd˝

  • s-R´

enyi graph of size n = 100 with probablity 0.2

◮ Signal bandlimited with k = 10 freq. coeff. ⇒ p = k = 10

10 -5 10 -4 10 -3 σ2

coeff

10 -5 10 -4 10 -3 10 -2

Relative MSE

EDS norm-1 EDS norm-2 EDS norm-∞ Conv Relaxation Random Conv Relaxation Tresholding Noise-Blind Heuristic Greedy

◮ Error of 2 · 10−5 reducing computational complexity by 10

15/22

slide-54
SLIDE 54

Example: Approximating the GFT

◮ Erd˝

  • s-R´

enyi graph of size n = 100 with probablity 0.2

◮ Signal bandlimited with k = 10 freq. coeff. ⇒ σ2 coeff = 10−4

6 8 10 12 14 16 18 20 22 24 p 10 -5 10 -4 10 -3 10 -2 Relative MSE

EDS norm-1 EDS norm-2 EDS norm-∞ Conv Relaxation Random Conv Relaxation Tresholding Noise-Blind Heuristic Greedy

◮ Error of 10−4 reducing computational complexity by 100/24 = 4.167

16/22

slide-55
SLIDE 55

Classification of handwritten digits

◮ Classify images of handwritten digits of the MNIST database ◮ Linear classifier in the PCA domain ⇒ Expensive linear operation

⇒ Subsume PCA and classifier in one linear operator

PCA Linear Classifier Image n = 784 few PCA coefficients k = 20 {0, 1} H = PCA+Classif. x (Image) n = 784 y

◮ Classify images by operating directly on a subset of pixels ◮ Images of size n = 784 pixels ⇒ Use only p = 20 pixels

⇒ Processing costs reduced by 39.2 for each image

17/22

slide-56
SLIDE 56

Classification of handwritten digits

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

(a) EDS norm-1

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

(b) EDS norm-2

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

(c) EDS norm-∞

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

(d) Tresholding

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

(e) Noise-Blind

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

(f) Greedy

◮ Sketching and sampling techniques achieve perfect classification

18/22

slide-57
SLIDE 57

Classification of handwritten digits

(a) EDS norm-1 (b) EDS norm-2 (c) EDS norm-∞ (d) Tresholding (e) Noise-Blind (f) Greedy

◮ Error rate using full image: 4.00%

⇒ Greedy approach using 20 pixels: 4.53%

19/22

slide-58
SLIDE 58

Classification of handwritten digits

◮ 200 image classification as a function of noise for p = 20 pixels

10 -5 10 -4 10 -3 σ2

coeff

1 2 3 4 5

Errors

EDS norm-1 Noise-Blind Heuristic EDS norm-2 EDS norm-∞ Conv Relaxation Random Conv Relaxation Tresholding Greedy 20/22

slide-59
SLIDE 59

Classification of handwritten digits

◮ 200 image classification as a function of the number of pixels

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

p

1 2 3 4

Errors

EDS norm-1 Noise-Blind Heuristic EDS norm-2 EDS norm-∞ Conv Relaxation Random Conv Relaxation Tresholding Greedy 21/22

slide-60
SLIDE 60

Conclusions

◮ Optimal sketch and sampling for processing bandlimited graph signals

⇒ Obtain approximate solution by operating only on a few samples ⇒ Accelerate processing of a sequence of bandlimited signals

◮ Joint design of matrix sketch and sampling scheme (prior to processing)

⇒ Two-stage optimization ⇒ Heuristic solutions for sampling problem

◮ Fast computation of GFT of a bandlimited graph signal

⇒ Errors in the order of 10−5 reducing the cost 10 times

◮ Classification of images of size 784 pixels of handwritten digits

⇒ Using as few as 20 pixels ⇒ 40 times less computational cost

◮ Journal version available on arXiv: arxiv.org/abs/1611.00119

22/22