Parametrizations of manifolds with heat kernels, multiscale analysis - - PowerPoint PPT Presentation

parametrizations of manifolds with heat kernels
SMART_READER_LITE
LIVE PREVIEW

Parametrizations of manifolds with heat kernels, multiscale analysis - - PowerPoint PPT Presentation

Parametrizations of manifolds with heat kernels, multiscale analysis on graphs, and applications to analysis of data sets Mauro Maggioni Mathematics and Computer Science Duke University U.S.C./I.M.I., Columbia, 3/5/08 In collaboration with


slide-1
SLIDE 1

Parametrizations of manifolds with heat kernels, multiscale analysis on graphs, and applications to analysis of data sets

Mauro Maggioni

Mathematics and Computer Science Duke University

U.S.C./I.M.I., Columbia, 3/5/08

In collaboration with R.R. Coifman, P .W. Jones, R. Schul, A.D. Szlam Funding: NSF-DMS, ONR.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-2
SLIDE 2

Plan

Review of Setting and Motivation Graphs from data sets Eigenfunction and heat kernel embeddings Multiscale construction, diffusion wavelets Examples and applications Conclusion

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-3
SLIDE 3

Low-dimensional sets in high-dimensional spaces

It has been shown, at least empirically, that in such situations the geometry of the data can help construct useful priors, for tasks such as classification, regression for prediction purposes. Problems: geometric: find intrinsic properties, such as local dimensionality, and local parameterizations. approximation theory: approximate functions on such data, respecting the geometry.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-4
SLIDE 4

Random walks and heat kernels on the data

Assume the data X = {xi} ⊂ Rn. Assume we can assign local similarities via a kernel function K(xi, xj) ≥ 0. Example: Kσ(xi, xj) = e−||xi−xj||2/σ. Model the data as a weighted graph (G, E, W): vertices represent data points, edges connect xi, xj with weight Wij := K(xi, xj), when positive. Let Dii =

j Wij and

P = D−1W

  • random walk

, T = D− 1

2 WD− 1 2

  • symm. “random walk′′

, H = e−t(I−T)

  • Heat kernel

Note 1: K typically depends on the type of data. Note 2: K should be “local”, i.e. close to 0 for points not sufficiently close.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-5
SLIDE 5

Random walks and heat kernels on the data

Assume the data X = {xi} ⊂ Rn. Assume we can assign local similarities via a kernel function K(xi, xj) ≥ 0. Example: Kσ(xi, xj) = e−||xi−xj||2/σ. Model the data as a weighted graph (G, E, W): vertices represent data points, edges connect xi, xj with weight Wij := K(xi, xj), when positive. Let Dii =

j Wij and

P = D−1W

  • random walk

, T = D− 1

2 WD− 1 2

  • symm. “random walk′′

, H = e−t(I−T)

  • Heat kernel

Note 1: K typically depends on the type of data. Note 2: K should be “local”, i.e. close to 0 for points not sufficiently close.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-6
SLIDE 6

Random walks and heat kernels on the data

Assume the data X = {xi} ⊂ Rn. Assume we can assign local similarities via a kernel function K(xi, xj) ≥ 0. Example: Kσ(xi, xj) = e−||xi−xj||2/σ. Model the data as a weighted graph (G, E, W): vertices represent data points, edges connect xi, xj with weight Wij := K(xi, xj), when positive. Let Dii =

j Wij and

P = D−1W

  • random walk

, T = D− 1

2 WD− 1 2

  • symm. “random walk′′

, H = e−t(I−T)

  • Heat kernel

Note 1: K typically depends on the type of data. Note 2: K should be “local”, i.e. close to 0 for points not sufficiently close.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-7
SLIDE 7

Handwritten Digits

Data base of about 60, 000 28 × 28 gray-scale pictures of handwritten digits, collected by USPS. Point cloud in R282. Goal: automatic recognition.

Set of 10, 000 picture (28 by 28 pixels) of 10 handwritten digits. Color represents the label (digit) of each point. Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-8
SLIDE 8

Eigenfunction Embedding theorems, I

[Joint with P .W. Jones and R. Schul]

We ask whether eigenfunctions of the Laplacian can be used to parametrize Euclidean domains and manifolds, in which generality this may be true, and which conditions such an embedding may

  • satisfy. Other recent proposed techniques include isomap, lle,

Hessian eigenmaps, maximum variance embedding; we are aware of proven results only for isomap and Hessian eigenmap, and in both cases the assumptions require the manifold to the isometric image of a Euclidean domain. Also, Bérard, Besson and Gallot (’84,’94) use all the eigenfunctions to embed into ℓ2.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-9
SLIDE 9

Eigenfunction Embedding theorems, I (cont’d)

Independently of the boundary conditions, we will denote by ∆ the Laplacian on Ω, For the purpose of this paper (both the Dirichlet and Neumann case) we restrict our study to domains where the spectrum is discrete and the corresponding heat kernel can be written as K Ω

t (z, w) =

  • ϕj(z)ϕj(w)e−λjt .

where the {ϕj} form an orthonormal basis for the appropriate Hilbert space with eigenvalues 0 ≤ λ0 ≤ · · · ≤ λj ≤ . . . . We also require #{j : λj ≤ T} ≤ CWeyl,ΩT

d 2 |Ω| .

Dirichlet case: OK, Neumann: possible problems.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-10
SLIDE 10

Eigenfunction Embedding theorems, II

Theorem (Embedding via Eigenfunctions, for Euclidean domains) Let Ω be a domain in Rd, with |Ω| = 1, and boundary as above. There are constants c1, . . . , c6 > 0 that depend only on d and CWeyl,Ω, such that the following hold. For any z ∈ Ω, let Rz ≤ dist (z, ∂Ω). Then there exist i1, . . . , id and constants c6R

d 2

z ≤ γ1 = γ1(z) , ..., γd = γd(z) ≤ 1 such that:

(a) Φ : Bc1Rz(z) → Rd, defined by x → (γ1ϕi1(x), . . . , γdϕid(x)) satisfies, for any x1, x2 ∈ B(z, c1Rz), c2 Rz ||x1 − x2|| ≤ ||Φ(x1) − Φ(x2)|| ≤ c3 Rz ||x1 − x2|| . (b) c4R−2

z

≤ λi1, . . . , λid ≤ c5R−2

z

.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-11
SLIDE 11

Eigenfunction Embedding theorems, III

Figure: Top left: a non-simply connected domain in R2, and the point z with its neighborhood to be mapped. center and left: Two eigenfunctions for mapping the neighborhood to roughly a unit ball.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-12
SLIDE 12

Eigenfunction Embedding theorems, IV

Let M be a smooth, d-dimensional compact manifold, possibly with

  • boundary. Suppose we are given a metric tensor g on M which is Cα

for some α > 0. For any z0 ∈ M, let (U, x) be a coordinate chart such that z0 ∈ U, gil(x(z0)) = δil and for any w ∈ U, and any ξ, ν ∈ Rd, cmin(g)||ξ||2

Rd ≤ d i,j=1 gij(x(w))ξiξj,

d

i,j=1 gij(x(w))ξiνj ≤ cmax(g)||ξ||Rd ||ν||Rd.

We let rM(z0) = sup{r > 0 : Br(x(z0)) ⊆ x(U)}. ∆Mf(x) = − 1 √det g

  • i,j=1

∂j

  • det g gij(x)∂if
  • (x) .

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-13
SLIDE 13

Eigenfunction Embedding theorems, IV

Theorem (Embedding via Eigenfunctions, for Manifolds) Let (M, g), z ∈ M be a d dimensional manifold and (U, x) be a chart as above. Also, assume |M| = 1. There are constants c1, . . . , c6 > 0, depending on d, cmin, cmax, ||g||α∧1, α ∧ 1, and CWeyl,Ω, such that the following hold. Let Rz = rM(z). Then there exist i1, . . . , id and constants c6R

d 2

z ≤ γ1 = γ1(z) , ..., γd = γd(z) ≤ 1 such that:

(a) the map Φ : Bc1Rz(z) → Rd, defined by x → (γ1ϕi1(x), . . . , γdϕid(x)) such that for any x1, x2 ∈ B(z, c1Rz) c2 Rz dM(x1, x2) ≤ ||Φ(x1) − Φ(x2)|| ≤ c3 Rz dM(x1, x2) . (b) c4R−2

z

≤ λi1, . . . , λid ≤ c5R−2

z

.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-14
SLIDE 14

Robust parametrizations through heat kernels

Theorem (Heat Triangulation Theorem - with P .W. Jones, R. Schul) Let (M, g) be a Riemannian manifold, with g at least Cα, α > 0, and z ∈ M. Let Rz be the radius of the largest ball on M, centered at z, which is bi-Lipschitz equivalent to a Euclidean ball. Let p1, ..., pd be d linearly independent directions. There are constants c1, . . . , c5 > 0, depending on d, cmin, cmax, ||g||α∧1, α ∧ 1, and the smallest and largest eigenvalues of the Gramian matrix (pi, pj)i=1,...,d, such that the following holds. Let yi be so that yi − z is in the direction pi, with c4Rz ≤ dM(yi, z) ≤ c5Rz for each i = 1, . . . , d and let tz = c6R2

  • z. The

map Φ : Bc1Rz(z) → Rd x → (Rd

z Ktz(x, y1)), . . . , Rd z Ktz(x, yd))

satisfies, for any x1, x2 ∈ Bc1Rz(z), c2 Rz dM(x1, x2) ≤ ||Φ(x1) − Φ(x2)|| ≤ c3 Rz dM(x1, x2) .

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-15
SLIDE 15

Idea of proof

When t ∼ λ−1 ∼ R2

z, prove heat kernel resembles Euclidean

Dirichlet heat kernel in a ball, in terms of its size and gradient, from above and below, with constants very weakly dependent

  • f the smoothness of the manifold. This will give the heat

triangulation theorem, since the heat kernel has the correct gradient estimates. For the eigenfunction theorem, look at the spectral expansion of the heat kernel, and observe that the main contribution to that series comes from frequencies in the correct range. So not all eigenfunctions in that range have small gradient → pigeon-hole → find eigenfunction with gradient of the correct size in a given direction → repeat over directions, each orthogonal to the span of the gradients of the previously chosen eigenfunctions.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-16
SLIDE 16

Idea of proof, II

How to prove smoothness-independent heat kernel estimates? Start with manifold with smooth metric, use probability:

Theorem Let x, y ∈ Bδ0Rz (z) be such that ||x − y|| < δ0Rz, δ0 < 1

4 . Let τn’s be the return times in B 3 2 δ0Rz (x) after

exiting B2δ0Rz (x), and xn(ω) = ω(τn(ω)). Then Ks(x, y) = K

Dir(B2δ0Rz (x)) s

(x, y) +

+∞

  • n=1

  • K

Dir(B2δ0Rz (x)) s−τn(ω)

(xn(ω), y)χ{τn(ω)<s}(ω)

  • P(τn < s) .

(1) Moreover there exists an M = M(cmax ) such that P(τn < s) d,M,cmin,cmax e−n

δ0Rz 2 2 2Ms

.

Then take limits of smooth metrics to the Cα metric. Pretty easy for the heat kernel, some tricks (time-stopping arguments) for the gradient.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-17
SLIDE 17

Still to do

Currently working at the discrete analogue, and implementation, of the above. Obstacles we are overcoming: Intrinsic dimension d unknown. Tools to overcome: dimension estimation through multiscale local PCA and/or through multiscale heat kernel time decay. Rz is unknown. Tools to overcome: forget about finding Rz! Greedily find the largest ball around z for which the heat kernel triangulation works. Theorem guarantees it will work at least on a large ball. Computational cost. Want linear in n, this is not trivial if heat kernels at “medium time” (say, √n) are needed. Tools: multiscale analysis of the heat kernel. Discrete data sets have a geometry which is more complicated than C1+α manifolds: dimensionality changes from point to point, strange things can happen to eigenfunctions and heat kernels, etc...

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-18
SLIDE 18

Analysis on the set

Equipped with good systems of coordinates on large pieces of the set, one can start doing analysis and approximation intrinsically on the set. Fourier analysis on data: use eigenfunctions for function

  • approximation. Ok for globally uniformly smooth functions.

Conjecture: most functions of interest are not in this class (Belkin, Niyogi, Coifman, Lafon). Diffusion wavelets: can construct multiscale analysis of wavelet-like functions on the set, adapted to the geometry

  • f diffusion, at different time scales (joint with R.Coifman).

The diffusion semigroup itself on the data can be used as a smoothing kernel. We recently obtained very promising results in image denoising and semisupervised learning (in a few slides, joint with A.D. Szlam and R. Coifman).

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-19
SLIDE 19

The Heat Kernel and the Laplacian on Manifolds

Starting point: the heat kernel and diffusion(s). Use a random walk T on the data as a way of exploring it. Ingredient needed: probability of jumping from each point to a neighboring point (e.g. from one document to a similar one, from one state of the molecule to a close-by state, etc...). This is computed as a function of the given similarity (=weight of the edges). Different ways of using this diffusion process: Look at T for very large time (T t for t large) → eigenfunctions of T → Fourier analysis (and basis) on the data Look at T for small time (T, T 2, . . . , T k, k constant) → it is diffusion on the set → “PDE” method, “no basis” Look at T at all time scales (T, T 2, T 4, . . . , T 2j, . . . ) → multiscale analysis of both functions and the diffusion process → wavelets and multiscale dynamical processes. Connections: Heat kernel - Brownian motion - potential theory - the heat equation - eigenfunctions of the Laplacian.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-20
SLIDE 20

Multiscale Analysis

We construct multiscale analyses associated with a diffusion-like process T on a space X, be it a manifold, a graph,

  • r a point cloud. This gives:

(i) A coarsening of X at different “geometric” scales, in a chain X → X1 → X2 → · · · → Xj . . . ; (ii) A coarsening (or compression) of the process T at all time scales tj = 2j, {Tj = [T 2j]

Φj Φj}j, each acting on the

corresponding Xj; (iii) A set of wavelet-like basis functions for analysis of functions (observables) on the manifold/graph/point cloud/set of states of the system. All the above come with guarantees: the coarsened system Xj and coarsened process Tj behave ǫ-closely as T 2j on X. This comes at the cost of a very careful coarsening: up to O(|X|2)

  • perations (< O(|X|3)!), and only O(|X|) in certain special

classes of problems.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-21
SLIDE 21

Coarsening of Markov chains

We now consider a simple example of a Markov chain on a graph with 8 states. T =             0.80 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.20 0.79 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.49 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.499 0.001 0.00 0.00 0.00 0.00 0.00 0.00 0.001 0.499 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.49 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.49 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.50             From the matrix it is clear that the states are grouped into four pairs {ν1, ν2}, {ν3, ν4}, {ν5, ν6}, and {ν7, ν8}, with weak interactions between the the pairs.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-22
SLIDE 22

Coarsening of Markov chains, II

Some dyadic powers of the Markov chain T. Compressed representations T6, T13, and corresponding scaling functions.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-23
SLIDE 23

Multiscale Analysis, I

We would like to construct multiscale bases, generalizing classical wavelets, on manifolds, graphs, point clouds. The classical construction is based on geometric transformations (such as dilations, translations) of the space, transformed into actions (e.g. via representations) on functions. There are plenty of such transformations on Rn. Here the space is in general highly non-symmetric, not invariant under ”natural” geometric transformation, and moreover it is “noisy”. Idea: use diffusion and the heat kernel as dilations, acting on functions on the space, to generate multiple scales. This is related to work on Littlewood-Paley theory of Markov semigroups (E. Stein).

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-24
SLIDE 24

Multiscale Analysis, II

Suppose for simplicity we have a weighted graph (G, E, W), with corresponding Laplacian L and random walk P. Let us renormalize, if necessary, P so it has norm 1 as an operator on L2: let T be this operator. Assume for simplicity that T is self-adjoint, and high powers of T are low-rank: T is a diffusion, so range of T t is spanned by smooth functions of increasingly (in t) smaller gradient. A “typical” spectrum for the powers of T would look like this:

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-25
SLIDE 25

Classical Multi-Resolution Analysis

A Multi-Resolution Analysis for L2(R) is a sequence of subspaces {Vj}j∈Z, Vj−1 ⊆ Vj, with ∩Vj = {0}, ∪Vj = L2(R), with an orthonormal basis {ϕj,k}k∈Z := {2

j 2 ϕ(2j · −k)}k∈Z for

  • Vj. Then there exist ψ such that {ψj,k}k∈Z := {2

j 2 ψ(2j · −k)}k∈Z

spans Wj, the orthogonal complement of Vj−1 in Vj. ˆ ϕj,k is essentially supported in {|ξ| ≤ 2j}, and ˆ ψj,k is essentially supported in the L.P .-annulus 2j−1 ≤ |ξ| ≤ 2j. Because Vj−1 ⊆ Vj, ϕj−1,0 =

k′ αk′ϕj,k′: refinement eqn.s,

FWT. We would like to generalize this construction to graphs. The frequency domain is the spectrum of e−L. Let Vj := {φi : λ2j

i ≥ ǫ}. Would like o.n. basis of well-localized

functions for Vj, and to derive refinement equations and downsampling rules in this context.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-26
SLIDE 26

Construction of Diffusion Wavelets

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-27
SLIDE 27

Signal Processing on Manifolds

From left to right: function F; reconstruction of the function F with top 50 best basis packets; reconstruction with top 200 eigenfunctions of the Beltrami Laplacian operator. Left to right: 50 top coefficients of F in its best diffusion wavelet basis, distribution coefficientsF in the delta basis, first 200 coefficients of F in the best basis and in the basis of eigenfunctions.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-28
SLIDE 28

Diffusion Wavelets on Dumbell manifold

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-29
SLIDE 29

Example: Multiscale text document organization

Scaling functions at different scales represented on the set embedded in R3 via (ξ3(x), ξ4(x), ξ5(x)). φ3,4 is about Mathematics, but in particular applications to networks, encryption and number theory; φ3,10 is about Astronomy, but in particular papers in X-ray cosmology, black holes, galaxies; φ3,15 is about Earth Sciences, but in particular earthquakes; φ3,5 is about Biology and Anthropology, but in particular about dinosaurs; φ3,2 is about Science and talent awards, inventions and science competitions.

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-30
SLIDE 30

Doc/Word multiscales

Scaling Fcn Document Titles Words ϕ2,3 Acid rain and agricultural pollution Nitrogen’s Increasing Im- pact in agriculture nitrogen,plant, ecologist,carbon, global ϕ3,3 Racing the Waves Seismol-

  • gists catch quakes

Tsunami! At Lake Tahoe? How a middling quake made a giant tsunami Waves of Death Seabed slide blamed for deadly tsunami Earthquakes: The deadly side of geometry earthquake,wave, fault,quake, tsunami ϕ3,5 Hunting Prehistoric Hurri- canes Extreme weather: Massive hurricanes Clearing the Air About Tur- bulence New map defines nation’s twister risk Southern twisters Oklahoma Tornado Sets Wind Record tornado,storm, wind,tornadoe, speed Some example of scaling functions on the documents, with some of the documents in their support, and some of the words most frequent in the documents. Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-31
SLIDE 31

Thinking multiscale on graphs...

Other constructions: Biorthogonal diffusion wavelets, in which scaling functions are probability densities (useful for multiscale Markov chains) Top-bottom constructions: recursive subdivision Both... Applications: Document organization and classification Markov Decision Processes Nonlinear Analysis of Images Semi-supervised learning through diffusion processes on data

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-32
SLIDE 32

Acknowledgements

R.R. Coifman, [Diffusion geometry; Diffusion wavelets; Uniformization via eigenfunctions; Multiscale Data Analysis], P .W. Jones (Yale Math), S.W. Zucker (Yale CS) [Diffusion geometry]; P .W. Jones (Yale Math), R. Schul (UCLA) [Uniformization via eigenfunctions; nonhomogenous Brownian motion];

  • S. Mahadevan (U.Mass CS) [Markov decision processes];

A.D. Szlam (UCLA) [Diffusion wavelet packets, top-bottom multiscale analysis, linear and nonlinear image denoising, classification algorithms based on diffusion]; G.L. Davis (Yale Pathology), R.R. Coifman, F.J. Warner (Yale Math), F.B. Geshwind , A. Coppi, R. DeVerse (Plain Sight Systems) [Hyperspectral Pathology];

  • H. Mhaskar (Cal State, LA) [polynomial frames of diffusion wavelets];

J.C. Bremer (Yale) [Diffusion wavelet packets, biorthogonal diffusion wavelets];

  • M. Mahoney, P

. Drineas (Yahoo Research) [Randomized algorithms for hyper-spectral imaging]

  • J. Mattingly, S. Mukherjee and Q. Wu (Duke Math,Stat,ISDS) [stochastic systems and learning]; A. Lin, E.

Monson (Duke Phys.) [Neuron-glia cell modeling]; D. Brady, R. Willett (Duke EE) [Compressed sensing and imaging]

Funding: NSF, ONR.

Thank you! www.math.duke.edu/~mauro

Mauro Maggioni Heat kernels and multiscale analysis on manifolds

slide-33
SLIDE 33

IPAM PROGRAM, FALL 2008

Internet Multi-Resolution Analysis: Foundations, Applications and Practice Upcoming interdisciplinary program at IPAM. Running Sep. 8-Dec. 2, 2008. For more information: www.ipam.ucla.edu/programs/mra2008/

Mauro Maggioni Heat kernels and multiscale analysis on manifolds