estimation of intrinsic dimensionality using high rate
play

Estimation of Intrinsic Dimensionality Using High-Rate Vector - PowerPoint PPT Presentation

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and Svetlana Lazebnik Beckman Institute and University of Illinois Dimensionality Estimation from Samples Problem: Estimate the intrinsic dimensionality d


  1. Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and Svetlana Lazebnik Beckman Institute and University of Illinois

  2. Dimensionality Estimation from Samples Problem: Estimate the intrinsic dimensionality d of a manifold M embedded in D -dimensional space ( d < D ) given n i.i.d. samples from M . D = 3, d = 2

  3. Previous Work Key idea: for data uniformly distributed on a d -dimensional smooth compact submanifold of , the probability of a small ball R D I of radius around any point on the manifold is . Θ ( ε d ) ε When only finitely many samples are available, these probabilities are to be estimated, e.g., from nearest-neighbor distances. Bennett (1969), Grassberger and Procaccia (1983), Camastra and Vinciarelli (2002), Brand (2003), Kegl (2003), Costa and Hero (2004), Levina and Bickel (2005) Shortcomings: negative bias (esp. in high extrinsic dimensions), behavior in the presence of noise poorly understood.

  4. Our Approach: High-Resolution VQ Key idea: when the data lying on a d -dimensional submanifold of are optimally vector-quantized with a large number k of R D I C · k − 1 /d codevectors, the quantizer error scales approximately as Advantages: • Can use simple and efficient techniques for empirical VQ design • Can ensure statistical consistency by using an independent test sequence • Effects of additive noise can be simply analyzed and understood

  5. The Basics of Vector Quantization A D -dimensional k -point quantizer maps a vector to R D Q k x ∈ I one of k codevectors ; log k is the quantizer R D , 1 ≤ i ≤ k y i ∈ I rate (bits/vector) Average distortion of a VQ : Q k δ r ( Q k | µ ) = E µ [ � X − Q k ( X ) � r ] ≡ E µ [ ρ r ( X, Q k ( X ))] r ∈ [1 , ∞ ) Average error: e r ( Q k | µ ) = [ δ r ( Q k | µ )] 1 /r Optimality: e ∗ r ( k | µ ) = Q k ∈ Q k e r ( Q k | µ ) inf Q k - the set of all D -dimensional k -point VQ’s

  6. Quantization Dimension M is a smooth compact d -dim. manifold embedded in R D I is a regular probability distribution on M : Pr(ball of rad. ε ) = Θ ( ε d ) µ High-rate approximation (HRA): VQ cells can be approximated r ( k | µ ) = Θ ( k − 1 /d ) by balls, and the optimal VQ error satisfies e ∗ log k Quantization dimension of of order r : d r ( µ ) = − lim µ log e ∗ r ( k | µ ) k →∞ (Zador, 1982; Graf & Luschgy, 2000) exists for all and in the limit as and is d r ( µ ) r ∈ [1 , ∞ ) r → ∞ equal to the intrinsic dimension d of the manifold M . VQ literature: assume d known and study asymptotics of e ∗ r ( k | µ ) Our work: observe empirical VQ error in the high-rate regime and estimate d

  7. Estimating the Quantization Dimension X n = ( X 1 , X 2 , . . . , X n ) Given a training sequence of i.i.d. samples Z m = ( Z 1 , . . . , Z m ) from M and an independent test sequence : 1. For each k in a range of codebook sizes for which the HRA is valid: - Training: use to learn a k -point VQ to minimize ˆ X n Q k n 1 ρ r ( X i , ˆ � Q k ( X i )) n i =1 - Testing: run on and approximate by ˆ Q k e ∗ r ( k | µ ) Z m � 1 /r m � 1 ρ r ( Z i , ˆ � e r ( k ) = ˆ Q k ( Z i )) m i =1 2. Plot vs. and estimate d from the slope of the − log ˆ e r ( k ) log k (linear) plot over the chosen range of k

  8. Statistical Consistency n 1 Minimizing the training error is necessary to ρ r ( X i , ˆ � Q k ( X i )) n i =1 approximate the optimal quantizer for µ However, the training error is an optimistically biased estimate of (the empirical VQ overfits the training sequence) e ∗ r ( k | µ ) Therefore, in order to obtain a statistically consistent estimate of , we need to measure VQ error on an independent test e ∗ r ( k | µ ) sequence: � 1 /r Q k | µ ) by the law of m � 1 ρ r ( Z i , ˆ ≈ e r ( ˆ � e r ( k ) = ˆ Q k ( Z i )) large numbers m i =1 n →∞ e r ( ˆ Statistical consistency follows since a.s. Q k | µ ) → e ∗ r ( k | µ ) −

  9. Results on Synthetic Data, r = 2

  10. The Limit and Packing Numbers r = ∞ M is compact, so exists, and e ∞ ( Q k | µ ) = lim r →∞ e r ( Q k | µ ) e ∞ ( Q k | µ ) = max x ∈ M � x − Q k ( x ) � (the worst-case quantization error of X by , independent of ) Q k µ The optimum is the smallest radius of the most e ∗ ∞ ( k | µ ) economical covering of M by k or fewer balls covering numbers worst-case VQ error log k log N M ( ε ) d ∞ = − lim d cap = − lim log e ∗ ∞ ( k | µ ) log ε k →∞ ε → 0 The limit of our scheme is equivalent to Kegl’s method r → ∞ based on covering/packing numbers (NIPS 2003)

  11. Choice of Distortion Exponent For finite , empirical VQ design is hard: optimal codevectors r � = 2 for a particular VQ partition are not given by the centroids of the partition regions. This makes r=2 a preferred choice. The limiting case is attractive because of its robustness r = ∞ against variations in the sampling density, but this is offset by increased sensitivity to noise.

  12. Effect of Noise Additive isotropic Gaussian noise. : a point on the manifold; X ∼ µ : independent from X ; noisy samples Y = X + W. W ∼ N (0 , σ 2 I ) For large n, the estimation error for is bounded by e ∗ r ( k | µ ) � 1 /r � Γ (( r + D ) / 2) √ + o (1) 2 σ Γ ( D/ 2) √ For r=2 , the bound becomes . D σ Bound Actual error

  13. Results on Real Data Handwritten digits: MNIST data set, http://yann.lecun.com/exdb/mnist Faces: http://www.cs.toronto.edu/~roweis/data.html, courtesy of B. Frey and S. Roweis

  14. Summary • The use of an independent test set helps to avoid negative bias. • Limiting case ( r = ∞ ) is equivalent to previous method based on packing numbers. • Our method can be seamlessly integrated with a VQ-based technique for dimensionality reduction (Raginsky, ISIT 2005). • Application: find a clustering of the data that follows the local neighborhood structure of the manifold (i.e., clusters are locally d -dimensional).

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend