Rmi Gribonval Inria Rennes - Bretagne Atlantique - - PowerPoint PPT Presentation

r mi gribonval inria rennes bretagne atlantique
SMART_READER_LITE
LIVE PREVIEW

Rmi Gribonval Inria Rennes - Bretagne Atlantique - - PowerPoint PPT Presentation

Rmi Gribonval Inria Rennes - Bretagne Atlantique remi.gribonval@inria.fr Contributors & Collaborators Anthony Bourrier Nicolas Keriven Yann Traonmilin Gilles Puy Gilles Blanchard Mike Davies Tomer Peleg Patrick Perez 2 R.


slide-1
SLIDE 1

Rémi Gribonval Inria Rennes - Bretagne Atlantique

remi.gribonval@inria.fr

slide-2
SLIDE 2
  • R. GRIBONVAL - CSA 2015 - Berlin

Contributors & Collaborators

2

Anthony Bourrier Nicolas Keriven Yann Traonmilin Tomer Peleg Gilles Puy Mike Davies Patrick Perez Gilles Blanchard

slide-3
SLIDE 3
  • R. GRIBONVAL - CSA 2015 - Berlin

Agenda

From Compressive Sensing to Compressive Learning ? Information-preserving projections & sketches Compressive Clustering / Compressive GMM Conclusion

3

slide-4
SLIDE 4
  • R. GRIBONVAL - CSA 2015 - Berlin

Machine Learning

Available data

training collection of feature vectors = point cloud

Goals

infer parameters to achieve a certain task generalization to future samples with the same probability distribution

Examples

4

PCA

principal subspace

Dictionary learning

dictionary

Clustering

centroids

Classification

classifier parameters (e.g. support vectors)

X

slide-5
SLIDE 5
  • R. GRIBONVAL - CSA 2015 - Berlin

Point cloud = large matrix of feature vectors

Challenging dimensions

5

X

slide-6
SLIDE 6
  • R. GRIBONVAL - CSA 2015 - Berlin

Point cloud = large matrix of feature vectors

Challenging dimensions

5

x1

X

slide-7
SLIDE 7
  • R. GRIBONVAL - CSA 2015 - Berlin

Point cloud = large matrix of feature vectors

Challenging dimensions

5

x1 x2

X

slide-8
SLIDE 8
  • R. GRIBONVAL - CSA 2015 - Berlin

Point cloud = large matrix of feature vectors

Challenging dimensions

5

x1 x2 xN

… X X

slide-9
SLIDE 9
  • R. GRIBONVAL - CSA 2015 - Berlin

Point cloud = large matrix of feature vectors

High feature dimension n Large collection size N

Challenging dimensions

5

x1 x2 xN

… X X

slide-10
SLIDE 10
  • R. GRIBONVAL - CSA 2015 - Berlin

Point cloud = large matrix of feature vectors

High feature dimension n Large collection size N

Challenging dimensions

5

x1 x2 xN

… X X

Challenge: compress before learning ?

X

slide-11
SLIDE 11
  • R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning ?

Point cloud = large matrix of feature vectors

6

x1 x2 xN

… X X

yN y2

y1

Y = MX M

slide-12
SLIDE 12
  • R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning ?

Point cloud = large matrix of feature vectors Reduce feature dimension

[Calderbank & al 2009, Reboredo & al 2013]

(Random) feature projection Exploits / needs low-dimensional feature model

6

x1 x2 xN

… X X

yN y2

y1

Y = MX

slide-13
SLIDE 13
  • R. GRIBONVAL - CSA 2015 - Berlin

Challenges of large collections

Feature projection: limited impact

7

X Y = MX

slide-14
SLIDE 14
  • R. GRIBONVAL - CSA 2015 - Berlin

Challenges of large collections

Feature projection: limited impact

7

X Y = MX

“Big Data” Challenge: compress collection size

slide-15
SLIDE 15
  • R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning ?

Point cloud = … empirical probability distribution

8

X

slide-16
SLIDE 16
  • R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning ?

Point cloud = … empirical probability distribution Reduce collection dimension

coresets

see e.g. [Agarwal & al 2003, Felman 2010]

sketching & hashing

see e.g. [Thaper & al 2002, Cormode & al 2005]

8

X

slide-17
SLIDE 17
  • R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning ?

Point cloud = … empirical probability distribution Reduce collection dimension

coresets

see e.g. [Agarwal & al 2003, Felman 2010]

sketching & hashing

see e.g. [Thaper & al 2002, Cormode & al 2005]

8

X M z ∈ Rm

Sketching operator nonlinear in the feature vectors linear in their probability distribution

slide-18
SLIDE 18
  • R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning ?

Point cloud = … empirical probability distribution Reduce collection dimension

coresets

see e.g. [Agarwal & al 2003, Felman 2010]

sketching & hashing

see e.g. [Thaper & al 2002, Cormode & al 2005]

8

X M z ∈ Rm

Sketching operator nonlinear in the feature vectors linear in their probability distribution

slide-19
SLIDE 19
  • R. GRIBONVAL - CSA 2015 - Berlin

Example: Compressive Clustering

9

X M z ∈ Rm

Recovery algorithm

estimated centroids ground truth

N = 1000; n = 2 m = 60

slide-20
SLIDE 20
  • R. GRIBONVAL - CSA 2015 - Berlin

Computational impact of sketching

10

Ph.D. A. Bourrier & N. Keriven Computation time Memory

Collection size N Collection size N

Time (s)

Memory (bytes)

slide-21
SLIDE 21
  • R. GRIBONVAL - CSA 2015 - Berlin

Data distribution Sketch

The Sketch Trick

11

X ∼ p(x)

z` = 1 N

N

X

i=1

h`(xi)

slide-22
SLIDE 22
  • R. GRIBONVAL - CSA 2015 - Berlin

Data distribution Sketch

The Sketch Trick

11

X ∼ p(x)

z` = 1 N

N

X

i=1

h`(xi)

≈ Eh`(X)

slide-23
SLIDE 23
  • R. GRIBONVAL - CSA 2015 - Berlin

Data distribution Sketch

The Sketch Trick

11

X ∼ p(x)

z` = 1 N

N

X

i=1

h`(xi)

≈ Eh`(X)

= Z h`(x)p(x)dx

slide-24
SLIDE 24
  • R. GRIBONVAL - CSA 2015 - Berlin

Data distribution Sketch

The Sketch Trick

11

X ∼ p(x)

z` = 1 N

N

X

i=1

h`(xi)

≈ Eh`(X)

= Z h`(x)p(x)dx

nonlinear in the feature vectors linear in the distribution p(x)

slide-25
SLIDE 25
  • R. GRIBONVAL - CSA 2015 - Berlin

Data distribution Sketch

The Sketch Trick

11

X ∼ p(x)

z` = 1 N

N

X

i=1

h`(xi)

≈ Eh`(X)

= Z h`(x)p(x)dx

y

Signal space

x

Observation space Signal Processing

inverse problems compressive sensing

M M

Probability space Sketch space Machine Learning

method of moments compressive learning

z p

Linear “projection” nonlinear in the feature vectors linear in the distribution p(x)

slide-26
SLIDE 26
  • R. GRIBONVAL - CSA 2015 - Berlin

Information preservation ?

Data distribution Sketch

The Sketch Trick

11

X ∼ p(x)

z` = 1 N

N

X

i=1

h`(xi)

≈ Eh`(X)

= Z h`(x)p(x)dx

y

Signal space

x

Observation space Signal Processing

inverse problems compressive sensing

M M

Probability space Sketch space Machine Learning

method of moments compressive learning

z p

Linear “projection” nonlinear in the feature vectors linear in the distribution p(x)

slide-27
SLIDE 27
  • R. GRIBONVAL - CSA 2015 - Berlin

The Sketch Trick

Data distribution Sketch

Dimension reduction ?

12

X ∼ p(x)

z` = 1 N

N

X

i=1

h`(xi)

≈ Eh`(X)

= Z h`(x)p(x)dx

y

Signal space

x

Observation space Signal Processing

inverse problems compressive sensing

M M

Probability space Sketch space Machine Learning

method of moments compressive learning

z p

Linear “projection” nonlinear in the feature vectors linear in the distribution p(x)

slide-28
SLIDE 28

Information preserving projections

slide-29
SLIDE 29
  • R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Signal space Observation space

Linear “projection”

x

Ex: set of k-sparse vectors

14

M y

m ⌧ n

Rm Rn

Model set

= signals of interest

Σ

Σk = {x 2 Rn, kxk0  k}

slide-30
SLIDE 30
  • R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Signal space Observation space

Linear “projection”

x

Ex: set of k-sparse vectors

14

M y

m ⌧ n

Rm Rn

Model set

= signals of interest

Σ

Σk = {x 2 Rn, kxk0  k}

Recovery algorithm

= “decoder”

∆ Ideal goal: build decoder with the guarantee that (instance optimality [Cohen & al 2009]) ∆

kx ∆(Mx + e)k  Ckek, 8x 2 Σ

slide-31
SLIDE 31
  • R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Signal space Observation space

Linear “projection”

x

Ex: set of k-sparse vectors

14

M y

m ⌧ n

Rm Rn

Model set

= signals of interest

Σ

Σk = {x 2 Rn, kxk0  k}

Recovery algorithm

= “decoder”

∆ Ideal goal: build decoder with the guarantee that (instance optimality [Cohen & al 2009]) ∆

kx ∆(Mx + e)k  Ckek, 8x 2 Σ Are there such decoders?

slide-32
SLIDE 32
  • R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery of k-sparse vectors

Typical decoders

L1 minimization

LASSO [Tibshirani 1994],Basis Pursuit [Chen & al 1999]

Greedy algorithms

(Orthonormal) Matching Pursuit [Mallat & Zhang 1993], Iterative Hard Thresholding (IHT) [Blumensath & Davies 2009], …

Guarantees

Assume Restricted isometry property

[Candès & al 2004] Exact recovery Stability to noise Robustness to model error

15

∆(y) := arg min

x:Mx=y

kxk1

1 δ  kMzk2

2

kzk2

2

 1 + δ

when kzk0  2k

slide-33
SLIDE 33
  • R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Low-dimensional model

Sparse

16

Signal space Observation space

Linear “projection”

x M y

m ⌧ n

Rm Rn

Model set

= signals of interest

Σ

slide-34
SLIDE 34
  • R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Low-dimensional model

Sparse Sparse in dictionary D

Signal space Observation space

Linear “projection”

x M y

m ⌧ n

Rm Rn

Model set

= signals of interest

Σ

17

slide-35
SLIDE 35
  • R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Low-dimensional model

Sparse Sparse in dictionary D Co-sparse in analysis operator A

total variation, physics-driven sparse models ..

18

Signal space Observation space

Linear “projection”

x M y

m ⌧ n

Rm Rn

Model set

= signals of interest

Σ

slide-36
SLIDE 36
  • R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Low-dimensional model

Sparse Sparse in dictionary D Co-sparse in analysis operator A

total variation, physics-driven sparse models …

Low-rank matrix or tensor

matrix completion, phase-retrieval, blind sensor calibration …

19

Signal space Observation space

Linear “projection”

x M y

m ⌧ n

Rm Rn

Model set

= signals of interest

Σ

slide-37
SLIDE 37
  • R. GRIBONVAL - CSA 2015 - Berlin

Low-dimensional model

Sparse Sparse in dictionary D Co-sparse in analysis operator A

total variation, physics-driven sparse models …

Low-rank matrix or tensor

matrix completion, phase-retrieval, blind sensor calibration …

Manifold / Union of manifolds

detection, estimation, localization, mapping …

Matrix with sparse inverse

Gaussian graphical models

Given point cloud

database indexing

Stable recovery

20

Signal space Observation space

Linear “projection”

x M y

m ⌧ n

Rm Rn

Model set

= signals of interest

Σ

slide-38
SLIDE 38
  • R. GRIBONVAL - CSA 2015 - Berlin

Low-dimensional model

Sparse Sparse in dictionary D Co-sparse in analysis operator A

total variation, physics-driven sparse models …

Low-rank matrix or tensor

matrix completion, phase-retrieval, blind sensor calibration …

Manifold / Union of manifolds

detection, estimation, localization, mapping …

Matrix with sparse inverse

Gaussian graphical models

Given point cloud

database indexing

Gaussian Mixture Model…

Stable recovery

21

Signal space Observation space

Linear “projection”

x M y

m ⌧ n

Rm Rn

Model set

= signals of interest

Σ

Vector spaceH

slide-39
SLIDE 39
  • R. GRIBONVAL - CSA 2015 - Berlin

Low-dimensional model

arbitrary set

General stable recovery

22

Observation space

Linear “projection”

x M y

m ⌧ n

Rm

Model set

= signals of interest

Σ

Σ ⊂ H

Signal space Rn Vector spaceH

Recovery algorithm

= “decoder”

∆ Ideal goal: build decoder with the guarantee that (instance optimality [Cohen & al 2009]) ∆

kx ∆(Mx + e)k  Ckek, 8x 2 Σ Are there such decoders?

slide-40
SLIDE 40
  • R. GRIBONVAL - CSA 2015 - Berlin

Theorem 1: RIP is necessary

Definition: (general) Restricted Isometry Property (RIP) on secant set RIP holds as soon as there exists an instance optimal decoder

Stable recovery from arbitrary model sets

23

α  kMzk kzk  β when z 2 Σ Σ := {x x0, x, x0 2 Σ}

up to renormalization

α = √ 1 − δ; β = √ 1 + δ

slide-41
SLIDE 41
  • R. GRIBONVAL - CSA 2015 - Berlin

Theorem 1: RIP is necessary

Definition: (general) Restricted Isometry Property (RIP) on secant set RIP holds as soon as there exists an instance optimal decoder

Theorem 2: RIP is sufficient

RIP implies existence of decoder with performance guarantees:

Exact recovery Stable to noise Bonus: robust to model error [Cohen & al 2009] for [Bourrier & al 2014] for arbitrary

kx ∆(Mx + e)k  C(δ)kek + C0(δ)dΣ(x, Σ)

Stable recovery from arbitrary model sets

23

α  kMzk kzk  β when z 2 Σ Σ := {x x0, x, x0 2 Σ}

up to renormalization

α = √ 1 − δ; β = √ 1 + δ

Σk

Σ

slide-42
SLIDE 42
  • R. GRIBONVAL - CSA 2015 - Berlin

Theorem 1: RIP is necessary

Definition: (general) Restricted Isometry Property (RIP) on secant set RIP holds as soon as there exists an instance optimal decoder

Theorem 2: RIP is sufficient

RIP implies existence of decoder with performance guarantees:

Exact recovery Stable to noise Bonus: robust to model error [Cohen & al 2009] for [Bourrier & al 2014] for arbitrary

kx ∆(Mx + e)k  C(δ)kek + C0(δ)dΣ(x, Σ)

Stable recovery from arbitrary model sets

23

α  kMzk kzk  β when z 2 Σ Σ := {x x0, x, x0 2 Σ}

up to renormalization

α = √ 1 − δ; β = √ 1 + δ

Σk

Σ

slide-43
SLIDE 43
  • R. GRIBONVAL - CSA 2015 - Berlin

Theorem 1: RIP is necessary

Definition: (general) Restricted Isometry Property (RIP) on secant set RIP holds as soon as there exists an instance optimal decoder

Theorem 2: RIP is sufficient

RIP implies existence of decoder with performance guarantees:

Exact recovery Stable to noise Bonus: robust to model error [Cohen & al 2009] for [Bourrier & al 2014] for arbitrary

kx ∆(Mx + e)k  C(δ)kek + C0(δ)dΣ(x, Σ)

Stable recovery from arbitrary model sets

23

α  kMzk kzk  β when z 2 Σ Σ := {x x0, x, x0 2 Σ}

up to renormalization

α = √ 1 − δ; β = √ 1 + δ

Σk

Σ

Distance to model set

slide-44
SLIDE 44

Compressive Learning Examples

slide-45
SLIDE 45
  • R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning

Point cloud = empirical probability distribution Reduce collection dimension = sketching

25

X z` = 1 N

N

X

i=1

h`(xi) 1 ≤ ` ≤ m M z ∈ Rm

Sketching operator

Choosing information preserving sketch ?

slide-46
SLIDE 46
  • R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

Standard approach = K-means

Sketching approach

p(x) is spatially localized need “incoherent” sampling choose Fourier sampling sample characteristic function choose sampling frequencies

Example: Compressive Clustering

26

z` = 1 N

N

X

i=1

ejw>

` xi

ω` ∈ Rn

slide-47
SLIDE 47
  • R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

Standard approach = K-means

Sketching approach

p(x) is spatially localized need “incoherent” sampling choose Fourier sampling sample characteristic function choose sampling frequencies

Example: Compressive Clustering

26

z` = 1 N

N

X

i=1

ejw>

` xi

ω` ∈ Rn

slide-48
SLIDE 48
  • R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

Standard approach = K-means

Sketching approach

p(x) is spatially localized need “incoherent” sampling choose Fourier sampling sample characteristic function choose sampling frequencies

Example: Compressive Clustering

26

z` = 1 N

N

X

i=1

ejw>

` xi

ω` ∈ Rn

slide-49
SLIDE 49
  • R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

Standard approach = K-means

Sketching approach

p(x) is spatially localized need “incoherent” sampling choose Fourier sampling sample characteristic function choose sampling frequencies

Example: Compressive Clustering

26

z` = 1 N

N

X

i=1

ejw>

` xi

ω` ∈ Rn

How ? see poster N. Keriven

slide-50
SLIDE 50
  • R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

27

X M z ∈ Rm N = 1000; n = 2

Sampled Characteristic Function

m = 60

Example: Compressive Clustering

slide-51
SLIDE 51
  • R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

27

ground truth

X M z ∈ Rm N = 1000; n = 2

Sampled Characteristic Function

m = 60

z = Mp ≈

K

X

k=1

αkMpθk p ≈

K

X

k=1

αkpθk

Density model=GMM with variance = identity

Example: Compressive Clustering

slide-52
SLIDE 52
  • R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

27

estimated centroids ground truth

X M z ∈ Rm N = 1000; n = 2

Sampled Characteristic Function

m = 60

z = Mp ≈

K

X

k=1

αkMpθk p ≈

K

X

k=1

αkpθk

Density model=GMM with variance = identity inspired by Iterative Hard Thresholding Recovery algorithm

= “decoder”

Example: Compressive Clustering

slide-53
SLIDE 53
  • R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

27

estimated centroids ground truth

X M z ∈ Rm N = 1000; n = 2

Sampled Characteristic Function

m = 60

z = Mp ≈

K

X

k=1

αkMpθk p ≈

K

X

k=1

αkpθk

Density model=GMM with variance = identity inspired by Iterative Hard Thresholding Recovery algorithm

= “decoder”

Example: Compressive Clustering

Compressive Hierarchical Splitting (CHS) = extension to general GMM similar to OMP with Replacement

slide-54
SLIDE 54
  • R. GRIBONVAL - CSA 2015 - Berlin

Application: Speaker Verification Results (DET-curves)

28

MFCC coefficients xi ∈ R12

N = 300 000 000

~ 50 Gbytes

~ 1000 hours of speech

slide-55
SLIDE 55
  • R. GRIBONVAL - CSA 2015 - Berlin

Application: Speaker Verification Results (DET-curves)

28

MFCC coefficients xi ∈ R12 After silence detection

N = 60 000 000 N = 300 000 000

~ 50 Gbytes

~ 1000 hours of speech

slide-56
SLIDE 56
  • R. GRIBONVAL - CSA 2015 - Berlin

Application: Speaker Verification Results (DET-curves)

28

MFCC coefficients xi ∈ R12 After silence detection

N = 60 000 000

Maximum size manageable by EM

N = 300 000 N = 300 000 000

~ 50 Gbytes

~ 1000 hours of speech

slide-57
SLIDE 57
  • R. GRIBONVAL - CSA 2015 - Berlin

Application: Speaker Verification Results (DET-curves)

29

MFCC coefficients xi ∈ R12 After silence detection

N = 60 000 000

Maximum size manageable by EM

N = 300 000 N = 300 000 000

~ 50 Gbytes

~ 1000 hours of speech

CHS

for EM for CHS

slide-58
SLIDE 58
  • R. GRIBONVAL - CSA 2015 - Berlin

CHS

Application: Speaker Verification Results (DET-curves)

30

MFCC coefficients xi ∈ R12 After silence detection

N = 60 000 000

Maximum size manageable by EM

N = 300 000 N = 300 000 000

~ 50 Gbytes

~ 1000 hours of speech

for EM for CHS

slide-59
SLIDE 59
  • R. GRIBONVAL - CSA 2015 - Berlin

Application: Speaker Verification Results (DET-curves)

31

~ 50 Gbytes

~ 1000 hours of speech

m= 500

close to EM 7 200 000-fold compression

  • ne QR code 40-L

CHS

slide-60
SLIDE 60
  • R. GRIBONVAL - CSA 2015 - Berlin

Application: Speaker Verification Results (DET-curves)

31

~ 50 Gbytes

~ 1000 hours of speech

m= 1000

same as EM 3 600 000-fold compression two QR codes 40-L

m= 500

close to EM 7 200 000-fold compression

  • ne QR code 40-L

CHS

slide-61
SLIDE 61
  • R. GRIBONVAL - CSA 2015 - Berlin

m= 5 000

better than EM exploit whole collection 720 000-fold compression fit 80 on 3”1/2 floppy disk

Application: Speaker Verification Results (DET-curves)

31

~ 50 Gbytes

~ 1000 hours of speech

m= 1000

same as EM 3 600 000-fold compression two QR codes 40-L

m= 500

close to EM 7 200 000-fold compression

  • ne QR code 40-L

CHS

slide-62
SLIDE 62

Computational Efficiency

slide-63
SLIDE 63
  • R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching

empirical characteristic function

33

z` = 1 N

N

X

i=1

ejw>

` xi

slide-64
SLIDE 64
  • R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching

empirical characteristic function

33

z` = 1 N

N

X

i=1

ejw>

` xi

X

slide-65
SLIDE 65
  • R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching

empirical characteristic function

33

z` = 1 N

N

X

i=1

ejw>

` xi

X

W

WX h(WX) h(·) = ej(·)

slide-66
SLIDE 66
  • R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching

empirical characteristic function

33

z` = 1 N

N

X

i=1

ejw>

` xi

X

W

WX h(WX) h(·) = ej(·) z

average

slide-67
SLIDE 67
  • R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching

empirical characteristic function

33

z` = 1 N

N

X

i=1

ejw>

` xi

X

W

WX h(WX) h(·) = ej(·) z

average

~ One-layer random neural net

DNN ~ hierarchical sketching ?

see also [Bruna & al 2013, Giryes & al 2015]

slide-68
SLIDE 68
  • R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching

empirical characteristic function

34

z` = 1 N

N

X

i=1

ejw>

` xi

X

W

WX h(WX) h(·) = ej(·) z

average

Privacy-reserving

sketch and forget

~ One-layer random neural net

DNN ~ hierarchical sketching ?

see also [Bruna & al 2013, Giryes & al 2015]

slide-69
SLIDE 69
  • R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching

empirical characteristic function

35

z` = 1 N

N

X

i=1

ejw>

` xi

X

W

WX h(WX) h(·) = ej(·) z

average

Streaming algorithms

One pass; online update

slide-70
SLIDE 70
  • R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching

empirical characteristic function

35

z` = 1 N

N

X

i=1

ejw>

` xi

X

W

WX h(WX) h(·) = ej(·) z

average

streaming

Streaming algorithms

One pass; online update

slide-71
SLIDE 71
  • R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching

empirical characteristic function

36

z` = 1 N

N

X

i=1

ejw>

` xi

X

W

WX h(WX) h(·) = ej(·) z

average

… … … …

DIS TRI BU TED

Distributed computing

Decentralized (HADOOP) / parallel (GPU)

slide-72
SLIDE 72

Conclusion

slide-73
SLIDE 73
  • R. GRIBONVAL - CSA 2015 - Berlin

Projections & Learning

38

y

Signal space

x

Observation space Signal Processing

compressive sensing

M M

Probability space Sketch space Machine Learning

compressive learning

z p

Linear “projection”

Compressive sensing random projections of data items Compressive learning with sketches

random projections of collections

nonlinear in the feature vectors linear in their probability distribution Reduce dimension of data items Reduce size of collection

slide-74
SLIDE 74
  • R. GRIBONVAL - CSA 2015 - Berlin

Summary

Compressive clustering & Compressive GMM

Bourrier, G., Perez, Compressive Gaussian Mixture Estimation. ICASSP 2013 Keriven & G.. Compressive Gaussian Mixture Estimation by Orthogonal Matching Pursuit with Replacement. SPARS 2015, Cambridge, United Kingdom Keriven & al, Sketching for Large-Scale Learning of Mixture Models (draft)

Unified framework covering projections & sketches Instance Optimal Decoders Restricted Isometry Property

Bourrier & al, Fundamental performance limits for ideal decoders in high-dimensional linear inverse problems. IEEE Transactions on Information Theory, 2014

39

Challenge: compress before learning ?

X

Information preservation ?

Details: poster N. Keriven

slide-75
SLIDE 75
  • R. GRIBONVAL - CSA 2015 - Berlin

Recent / ongoing work / challenges

Sufficient dimension for RIP

Puy, Davies & G., Recipes for stable linear embeddings from Hilbert spaces to ℝ^m, hal-01203614, see also EUSIPCO 2015 and [Dirksen 2014]

RIP for sketches in RKHS applied to compressive GMM

upcoming, Keriven, Bourrier, Perez & G.

Compressive statistical learning: intrinsic dimension of PCA and other related learning tasks

work in progress, Blanchard & G.

RIP-based guarantees for general (convex & nonconvex) regularizers

Traonmilin & G, Stable recovery of low-dimensional cones in Hilbert spaces - One RIP to rule them all, arXiv:1510.00504 extends sharp RIP 1/sqrt(2) [Cai & Zhang 2014] beyond sparsity (low-rank; block/structured …)

40

m = O(dB(Σ − Σ))

Dimension reduction ? Decoders?

Details: poster G. Puy

slide-76
SLIDE 76
  • R. GRIBONVAL - CSA 2015 - Berlin
  • Postdoc / R&D engineer positions @ IRISA

✓ theoretical and algorithmic foundations of large-scale

machine learning & signal processing

✓ funded by ERC project PLEASE

TH###NKS #

Interested ? Joint the team

41

slide-77
SLIDE 77

TH###NKS #