Multiresolution Matrix Factorization Risi Kondor, The University of - - PowerPoint PPT Presentation

multiresolution matrix factorization
SMART_READER_LITE
LIVE PREVIEW

Multiresolution Matrix Factorization Risi Kondor, The University of - - PowerPoint PPT Presentation

Multiresolution Matrix Factorization Risi Kondor, The University of Chicago Nedelina Teneva Pramod Mudrakarta UChicago UChicago . Wavelets on graphs Learning on graphs Semi-supervised learning [Shuman et al., 2013]] 2 / 32 2/32 .


slide-1
SLIDE 1

Multiresolution Matrix Factorization

Risi Kondor, The University of Chicago

Nedelina Teneva UChicago Pramod Mudrakarta UChicago

slide-2
SLIDE 2

.

Wavelets on graphs

  • Learning on graphs
  • Semi-supervised learning

[Shuman et al., 2013]]

2/32 .

2/32

slide-3
SLIDE 3

.

Wavelets on graphs: recent work

  • Diffusion Wavelets [Coifman & Maggioni, 2006]
  • Treelets [Lee, Nadler & Wasserman, 2008]
  • Spectral graph wavelets [Hammond, Vandergheynst & Gribonval, 2010]
  • Tree wavelets [Gavish, Nadler & Coifman, 2010]
  • Laplacian eignevector based wavelets [Irion & Saito, 2015]

3/32 .

3/32

slide-4
SLIDE 4

.

Wavelets on graphs

← →

Fast multilevel matrix algorithms

4/32 .

4/32

slide-5
SLIDE 5

Multiresolution analysis

slide-6
SLIDE 6

.

Fourier to Wavelets

− →

  • Canonical (eigenfunctions of

translation operator / Laplacian).

  • Perfectly localized in frequency.
  • Perfectly delocalized in position.
  • Generated from some mother

wavelet by translations and dilations.

  • Localized in space and frequency.
  • Much better at resolving

discontinuities.

6/32 .

6/32

slide-7
SLIDE 7

.

Multiresolution on R: wavelets

  • 1. Define the mother wavelet ψ.
  • 2. Define the basis

ψℓ

m(x) = 2−ℓ/2 ψ(2−ℓx − m)

  • 3. The wavelet transform of a function f is

f(x) = ∑

ℓ,m αℓ

mψℓ m(x) + ∑

m βmφm(x)

7/32 .

7/32

slide-8
SLIDE 8

.

More abstractly...

Repeatedly split the space of functions on X into the direct sum of a

  • Scaling space Vℓ+1 (with basis

{ φℓ

m

}

)

  • Wavelet space Wℓ+1 (with basis

{ ψℓ

m

}

). The key to fast wavelet transforms is the that each orthogonal map

Vℓ → Vℓ+1 ⊕ Wℓ+1

is a very sparse.

8/32 .

8/32

slide-9
SLIDE 9

.

Multiresolution on R

Mallat [1989] showed (roughly) that if

  • 1. ∩

j Vℓ = {0},

  • 2. ∪

ℓ Vℓ is dense in L2(R),

  • 3. If f ∈ Vℓ then f′(x) = f(x − 2ℓm) is also in Vℓ for any m ∈ Z,
  • 4. If f ∈ Vℓ, then f′(x) = f(2x) is in Vℓ−1,

then there is a mother wavelet ψ and a father wavelet φ s. t.

ψℓ

m = 2−ℓ/2 ψ(2−ℓx − m)

and

φℓ

m = 2−ℓ/2 φ(2−ℓx − m).

9/32 .

9/32

slide-10
SLIDE 10

.

Multiresolution on discrete spaces

Which of the ideas from classical multiresolution still make sense?

  • Repeatedly split L(X) into smoother and rougher parts. ✓
  • Basis functions should be localized in space & frequency. ✓
  • Each Φℓ

Qℓ

− → Φℓ+1 ∪ Ψℓ+1 transform is orthogonal and sparse. ✓

  • Each ψℓ

m is derived by translating ψℓ → MAYBE

  • Each ψℓ is derived by scaling ψ → ???

10/32 .

10/32

slide-11
SLIDE 11

.

General principles

  • 1. The sequence L(X) = V0 ⊃ V1 ⊃ V2 ⊃ . . . is a filtration of Rn in terms
  • f smoothness with respect to T in the sense that

µℓ = inf

f∈Vℓ\{0} ⟨f, Tf⟩ / ⟨f, f⟩

increases at a given rate.

  • 2. The wavelets are localized in the sense that

inf

x∈X sup y∈X

ψℓ

m(y)

d(x, y)α

increases no faster than a certain rate.

  • 3. Letting Qℓ be the matrix expressing Φℓ∪Ψℓ in the previous basis Φℓ−1, i.e.,

φℓ

m = ∑dim(Vℓ−1) i=1

[Qℓ]m,i φℓ−1

i

ψℓ

m = ∑dim(Vℓ−1) i=1

[Qℓ]m+dim(Vℓ−1),i φℓ−1

i

,

each Qℓ orthogonal transform is sparse, guaranteeing the existence of a fast wavelet transform (Φ0 is taken to be the standard basis, φ0

m = em).

11/32 .

11/32

slide-12
SLIDE 12

Multiresolution Matrix Factorization (MMF)

slide-13
SLIDE 13

.

Classical approach: Define wavelets

− →

Derive FWT MMF approach: Prescribe form of FWT

− →

Wavelets fall out

13/32 .

13/32

slide-14
SLIDE 14

.

Multiresolution Matrix Factorization

( . ) QL . . . ( . ) Q1 ( .

.

) A ( . ) Q⊤

1

. . . ( . ) Q⊤

L

≈ ( . ) H

  • Each Qℓ is super-sparse (Givens rotation or k–point rotation).
  • For some nested sequence of sets [n] = S1 ⊇ S2 ⊇ . . . ⊇ SL+1,

[Qℓ][n]\Sℓ, [n]\Sℓ = In−δℓ−1.

  • H is core-diagonal.

Here A can be the Laplacian of a graph or any symmetric matrix.

14/32 .

14/32

slide-15
SLIDE 15

.

Multiresolution Matrix Factorization

( .

.

) A ≈ ( . ) Q⊤

1

. . . ( . ) Q⊤

L

( . ) H ( . ) QL . . . ( . ) Q1

The columns of Q⊤

1 Q⊤ 2 . . . Q⊤ L are a

  • Wavelet basis for the column space of A.
  • A multilevel sparse dictionary (hierarchically sparse PCA).

MMF structure is a generalization of the notion of rank.

15/32 .

15/32

slide-16
SLIDE 16

.

Computation

MMF reduces find the wavelet basis to an optimization problem minimize

[n] ⊇ S1 ⊇ . . . ⊇ SL H∈Hn

SL; Q1, . . . , QL∈ Q

∥ A − Q⊤

1 . . . Q⊤ L H QL . . . Q1 ∥2

Frob.

for a given class Q of local rotations and dimensions δ1 ≥ δ2 ≥ . . . δL. Natural greedy optimization approach:

A → Q1AQ⊤

1 → Q2Q1AQ⊤ 1 Q⊤ 2 → . . . .

In practice combined with randomization and othe tricks to make it fast.

16/32 .

16/32

slide-17
SLIDE 17

.

Hierarchical structure

The sequence in which MMF (with k ≥ 3) eliminates dimensions induces a (soft) hierarchical clustering amongst the dimensions (mixture of trees).

17/32 .

17/32

slide-18
SLIDE 18

.

Applications

  • 1. Generate a wavelet bass for graphs/matrices.
  • 2. Reveal structural properties of graphs (communities).
  • 3. Generate graphs with hierarchical structure.
  • 4. Compress graphs and matrices (sketching).
  • 5. Fast approximate matrix inverse → preconditioner.
  • 6. Hierachical scaffold for other fast numerics.

18/32 .

18/32

slide-19
SLIDE 19

.

Relationship to other algorithms

  • Treelets [Lee, Nadler & Wasserman, 2008]: special case with k = 2 and

heuristic approach.

  • Diffusion wavelets [Coifman and Maggioni, 2006]: fual approach – focus
  • n smoothness rather than sparsity (leads to repeated Gram–Schmidt).
  • Fast multipole methods [Greengard & Rokhlin, 1987–] Aggregate at

different scales.

  • Multigrid [Brandt, 1970s–] Solve complex problems at multiple scales that

communicate with each other.

  • Hierarchical Matrices [Hackbusch, Borm, Chandrasekaran,…]

H–matrices, H2–matrices, HSS matrices,...

19/32 .

19/32

slide-20
SLIDE 20

The pMMF library

slide-21
SLIDE 21

.

Highly optimized open source parallel C++ library:

  • Custom sparse matrix classes
  • Blocked matrices → parallelism
  • Randomization, etc..
  • Interface: C++ API/Matlab/command line/GUI.

http://people.cs.uchicago.edu/ risi/MMF/index.html

21/32 .

21/32

slide-22
SLIDE 22

.

Blocking and stages

Rows/columns are clustered, matrix is correspondingly blocked, and rotations are found within clusters. A run of rotations conforming to the same clustering structure is called a stage. .

  • A

.

  • Q⊤

1

·

.

  • A

·

.

  • Q⊤

1

Different columns of blocks (“towers”) can be sent to different processors.

22/32 .

22/32

slide-23
SLIDE 23

.

Reblocking

After the stage is complete, rows/columns are reclustered. It is critical that reblocking also be efficient. .

. . . . .

. . . . .

.

.

.

23/32 .

23/32

slide-24
SLIDE 24

.

Matrix Free Arithmetic

When applying an MMF factorization to a vector, the vector must go through the same reblocking process.

. . .

.

  • Q3

.

  • Q2

.

  • Q1

.

  • v

24/32 .

24/32

slide-25
SLIDE 25

.

Graph demo

25/32 .

25/32

slide-26
SLIDE 26

.

Compression results

26/32 .

26/32

slide-27
SLIDE 27

.

Preconditioning results

27/32 .

27/32

slide-28
SLIDE 28

.

Wall clock time

28/32 .

28/32

slide-29
SLIDE 29

.

Further Applications

[Meneveau] [Lieberman-Aiden et al., 2009]

29/32 .

29/32

slide-30
SLIDE 30

CONCLUSIONS

slide-31
SLIDE 31

.

Conclusions

  • Matrices coming from data are usually NOT like
  • Random matrices
  • Worst case matrices
  • Low rank matrices.
  • Large–scale problems can only be solved by breaking them into smaller
  • nes.
  • In Applied Math there is a long tradition of this, but not obvious how to translate

it to less structured setting.

  • MMF is a way to find the multiresolution structure in data and exploit it for both

computational and statistical ends.

  • Multiresolution structure is an alternative to the notion of rank.

31/32 .

31/32

slide-32
SLIDE 32

.

Acknowledgements

Co-authors:

  • Nedelina Teneva (UChicago)
  • Pramod Mudrakarta (UChicago)
  • Vikas Garg (MIT)

Thanks:

  • Andreas Krause and Joel Tropp.

32/32 .

32/32