Estimation of Intrinsic Dimensionality Using High-Rate Vector - - PowerPoint PPT Presentation
Estimation of Intrinsic Dimensionality Using High-Rate Vector - - PowerPoint PPT Presentation
Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and Svetlana Lazebnik Beckman Institute and University of Illinois Dimensionality Estimation from Samples Problem: Estimate the intrinsic dimensionality d
Dimensionality Estimation from Samples
D = 3, d = 2 Problem: Estimate the intrinsic dimensionality d of a manifold M embedded in D-dimensional space (d < D) given n i.i.d. samples from M.
Previous Work
Bennett (1969), Grassberger and Procaccia (1983), Camastra and Vinciarelli (2002), Brand (2003), Kegl (2003), Costa and Hero (2004), Levina and Bickel (2005) Key idea: for data uniformly distributed on a d-dimensional smooth compact submanifold of , the probability of a small ball
- f radius around any point on the manifold is .
Shortcomings: negative bias (esp. in high extrinsic dimensions), behavior in the presence of noise poorly understood. I RD ε Θ(εd) When only finitely many samples are available, these probabilities are to be estimated, e.g., from nearest-neighbor distances.
Advantages: Key idea: when the data lying on a d-dimensional submanifold of are optimally vector-quantized with a large number k of codevectors, the quantizer error scales approximately as
Our Approach: High-Resolution VQ
I RD C · k−1/d
- Can use simple and efficient techniques for
empirical VQ design
- Can ensure statistical consistency by using an
independent test sequence
- Effects of additive noise can be simply analyzed
and understood
The Basics of Vector Quantization
Average distortion of a VQ : Qk er(Qk|µ) = [δr(Qk|µ)]1/r Average error: Optimality: Qk - the set of all D-dimensional k-point VQ’s e∗
r(k|µ) =
inf
Qk∈Qk er(Qk|µ)
A D-dimensional k-point quantizer maps a vector to
- ne of k codevectors ; log k is the quantizer
rate (bits/vector) r ∈ [1, ∞) δr(Qk|µ) = Eµ[X − Qk(X)r] ≡ Eµ[ρr(X, Qk(X))] x ∈ I RD yi ∈ I RD, 1 ≤ i ≤ k Qk
exists for all and in the limit as and is equal to the intrinsic dimension d of the manifold M.
Quantization Dimension
M is a smooth compact d-dim. manifold embedded in is a regular probability distribution on M: µ High-rate approximation (HRA): VQ cells can be approximated by balls, and the optimal VQ error satisfies
Pr(ball of rad. ε) = Θ(εd)
Quantization dimension of of order r:
e∗
r(k|µ) = Θ(k−1/d)
µ
(Zador, 1982; Graf & Luschgy, 2000)
dr(µ) r ∈ [1, ∞) VQ literature: assume d known and study asymptotics of r → ∞
dr(µ) = − lim
k→∞
log k log e∗
r(k|µ)
Our work: observe empirical VQ error in the high-rate regime and estimate d e∗
r(k|µ)
I RD
Estimating the Quantization Dimension
Given a training sequence of i.i.d. samples from M and an independent test sequence : Xn = (X1, X2, . . . , Xn) Zm = (Z1, . . . , Zm)
- 1. For each k in a range of codebook sizes for which the HRA is
valid:
- Training: use to learn a k-point VQ to minimize
Xn ˆ Qk
1 n
n
- i=1
ρr(Xi, ˆ Qk(Xi))
- Testing: run on and approximate by
Zm ˆ Qk e∗
r(k|µ)
ˆ er(k) =
- 1
m
m
- i=1
ρr(Zi, ˆ Qk(Zi)) 1/r
- 2. Plot vs. and estimate d from the slope of the
(linear) plot over the chosen range of k log k − log ˆ er(k)
Statistical Consistency
Minimizing the training error is necessary to approximate the optimal quantizer for
1 n
n
- i=1
ρr(Xi, ˆ Qk(Xi))
µ However, the training error is an optimistically biased estimate
- f (the empirical
VQ overfits the training sequence) e∗
r(k|µ)
Therefore, in order to obtain a statistically consistent estimate of , we need to measure VQ error on an independent test sequence: e∗
r(k|µ)
Statistical consistency follows since a.s. er( ˆ Qk|µ)
n→∞
− → e∗
r(k|µ)
ˆ er(k) =
- 1
m
m
- i=1
ρr(Zi, ˆ Qk(Zi)) 1/r ≈ er( ˆ Qk|µ) by the law of
large numbers
Results on Synthetic Data, r = 2
The Limit and Packing Numbers
M is compact, so exists, and
r = ∞
e∞(Qk|µ) = lim
r→∞ er(Qk|µ)
e∞(Qk|µ) = max
x∈M x − Qk(x)
(the worst-case quantization error of X by , independent of ) Qk The optimum is the smallest radius of the most economical covering of M by k or fewer balls e∗
∞(k|µ)
µ The limit of our scheme is equivalent to Kegl’s method based on covering/packing numbers (NIPS 2003) r → ∞ covering numbers worst-case VQ error dcap = − lim
ε→0
log NM(ε) log ε d∞ = − lim
k→∞
log k log e∗
∞(k|µ)
Choice of Distortion Exponent
For finite , empirical VQ design is hard: optimal codevectors for a particular VQ partition are not given by the centroids of the partition regions. This makes r=2 a preferred choice. r = 2 The limiting case is attractive because of its robustness against variations in the sampling density, but this is offset by increased sensitivity to noise. r = ∞
Effect of Noise
Additive isotropic Gaussian noise. : a point on the manifold; : independent from X; noisy samples Y = X + W. For large n, the estimation error for is bounded by For r=2, the bound becomes . X ∼ µ W ∼ N(0, σ2I) e∗
r(k|µ)
√ 2σ Γ((r + D)/2) Γ(D/2) 1/r
+o(1)
σ √ D Bound Actual error
Results on Real Data
Handwritten digits: MNIST data set, http://yann.lecun.com/exdb/mnist Faces: http://www.cs.toronto.edu/~roweis/data.html, courtesy of B. Frey and S. Roweis
Summary
- The use of an independent test set helps to avoid negative bias.
- Limiting case (r = ∞) is equivalent to previous method based on
packing numbers.
- Our method can be seamlessly integrated with a
VQ-based technique for dimensionality reduction (Raginsky, ISIT 2005).
- Application: find a clustering of the data that follows the local