Estimation of Intrinsic Dimensionality Using High-Rate Vector - - PowerPoint PPT Presentation

estimation of intrinsic dimensionality using high rate
SMART_READER_LITE
LIVE PREVIEW

Estimation of Intrinsic Dimensionality Using High-Rate Vector - - PowerPoint PPT Presentation

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and Svetlana Lazebnik Beckman Institute and University of Illinois Dimensionality Estimation from Samples Problem: Estimate the intrinsic dimensionality d


slide-1
SLIDE 1

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization

Maxim Raginsky and Svetlana Lazebnik Beckman Institute and University of Illinois

slide-2
SLIDE 2

Dimensionality Estimation from Samples

D = 3, d = 2 Problem: Estimate the intrinsic dimensionality d of a manifold M embedded in D-dimensional space (d < D) given n i.i.d. samples from M.

slide-3
SLIDE 3

Previous Work

Bennett (1969), Grassberger and Procaccia (1983), Camastra and Vinciarelli (2002), Brand (2003), Kegl (2003), Costa and Hero (2004), Levina and Bickel (2005) Key idea: for data uniformly distributed on a d-dimensional smooth compact submanifold of , the probability of a small ball

  • f radius around any point on the manifold is .

Shortcomings: negative bias (esp. in high extrinsic dimensions), behavior in the presence of noise poorly understood. I RD ε Θ(εd) When only finitely many samples are available, these probabilities are to be estimated, e.g., from nearest-neighbor distances.

slide-4
SLIDE 4

Advantages: Key idea: when the data lying on a d-dimensional submanifold of are optimally vector-quantized with a large number k of codevectors, the quantizer error scales approximately as

Our Approach: High-Resolution VQ

I RD C · k−1/d

  • Can use simple and efficient techniques for

empirical VQ design

  • Can ensure statistical consistency by using an

independent test sequence

  • Effects of additive noise can be simply analyzed

and understood

slide-5
SLIDE 5

The Basics of Vector Quantization

Average distortion of a VQ : Qk er(Qk|µ) = [δr(Qk|µ)]1/r Average error: Optimality: Qk - the set of all D-dimensional k-point VQ’s e∗

r(k|µ) =

inf

Qk∈Qk er(Qk|µ)

A D-dimensional k-point quantizer maps a vector to

  • ne of k codevectors ; log k is the quantizer

rate (bits/vector) r ∈ [1, ∞) δr(Qk|µ) = Eµ[X − Qk(X)r] ≡ Eµ[ρr(X, Qk(X))] x ∈ I RD yi ∈ I RD, 1 ≤ i ≤ k Qk

slide-6
SLIDE 6

exists for all and in the limit as and is equal to the intrinsic dimension d of the manifold M.

Quantization Dimension

M is a smooth compact d-dim. manifold embedded in is a regular probability distribution on M: µ High-rate approximation (HRA): VQ cells can be approximated by balls, and the optimal VQ error satisfies

Pr(ball of rad. ε) = Θ(εd)

Quantization dimension of of order r:

e∗

r(k|µ) = Θ(k−1/d)

µ

(Zador, 1982; Graf & Luschgy, 2000)

dr(µ) r ∈ [1, ∞) VQ literature: assume d known and study asymptotics of r → ∞

dr(µ) = − lim

k→∞

log k log e∗

r(k|µ)

Our work: observe empirical VQ error in the high-rate regime and estimate d e∗

r(k|µ)

I RD

slide-7
SLIDE 7

Estimating the Quantization Dimension

Given a training sequence of i.i.d. samples from M and an independent test sequence : Xn = (X1, X2, . . . , Xn) Zm = (Z1, . . . , Zm)

  • 1. For each k in a range of codebook sizes for which the HRA is

valid:

  • Training: use to learn a k-point VQ to minimize

Xn ˆ Qk

1 n

n

  • i=1

ρr(Xi, ˆ Qk(Xi))

  • Testing: run on and approximate by

Zm ˆ Qk e∗

r(k|µ)

ˆ er(k) =

  • 1

m

m

  • i=1

ρr(Zi, ˆ Qk(Zi)) 1/r

  • 2. Plot vs. and estimate d from the slope of the

(linear) plot over the chosen range of k log k − log ˆ er(k)

slide-8
SLIDE 8

Statistical Consistency

Minimizing the training error is necessary to approximate the optimal quantizer for

1 n

n

  • i=1

ρr(Xi, ˆ Qk(Xi))

µ However, the training error is an optimistically biased estimate

  • f (the empirical

VQ overfits the training sequence) e∗

r(k|µ)

Therefore, in order to obtain a statistically consistent estimate of , we need to measure VQ error on an independent test sequence: e∗

r(k|µ)

Statistical consistency follows since a.s. er( ˆ Qk|µ)

n→∞

− → e∗

r(k|µ)

ˆ er(k) =

  • 1

m

m

  • i=1

ρr(Zi, ˆ Qk(Zi)) 1/r ≈ er( ˆ Qk|µ) by the law of

large numbers

slide-9
SLIDE 9

Results on Synthetic Data, r = 2

slide-10
SLIDE 10

The Limit and Packing Numbers

M is compact, so exists, and

r = ∞

e∞(Qk|µ) = lim

r→∞ er(Qk|µ)

e∞(Qk|µ) = max

x∈M x − Qk(x)

(the worst-case quantization error of X by , independent of ) Qk The optimum is the smallest radius of the most economical covering of M by k or fewer balls e∗

∞(k|µ)

µ The limit of our scheme is equivalent to Kegl’s method based on covering/packing numbers (NIPS 2003) r → ∞ covering numbers worst-case VQ error dcap = − lim

ε→0

log NM(ε) log ε d∞ = − lim

k→∞

log k log e∗

∞(k|µ)

slide-11
SLIDE 11

Choice of Distortion Exponent

For finite , empirical VQ design is hard: optimal codevectors for a particular VQ partition are not given by the centroids of the partition regions. This makes r=2 a preferred choice. r = 2 The limiting case is attractive because of its robustness against variations in the sampling density, but this is offset by increased sensitivity to noise. r = ∞

slide-12
SLIDE 12

Effect of Noise

Additive isotropic Gaussian noise. : a point on the manifold; : independent from X; noisy samples Y = X + W. For large n, the estimation error for is bounded by For r=2, the bound becomes . X ∼ µ W ∼ N(0, σ2I) e∗

r(k|µ)

√ 2σ Γ((r + D)/2) Γ(D/2) 1/r

+o(1)

σ √ D Bound Actual error

slide-13
SLIDE 13

Results on Real Data

Handwritten digits: MNIST data set, http://yann.lecun.com/exdb/mnist Faces: http://www.cs.toronto.edu/~roweis/data.html, courtesy of B. Frey and S. Roweis

slide-14
SLIDE 14

Summary

  • The use of an independent test set helps to avoid negative bias.
  • Limiting case (r = ∞) is equivalent to previous method based on

packing numbers.

  • Our method can be seamlessly integrated with a

VQ-based technique for dimensionality reduction (Raginsky, ISIT 2005).

  • Application: find a clustering of the data that follows the local

neighborhood structure of the manifold (i.e., clusters are locally d-dimensional).