manifold adaptive dimension estimation
play

Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , - PowerPoint PPT Presentation

Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , Csaba Szepesvri (1) , Jean-Yves Audibert (2) (1) Department of Computing Science, University of Alberta, Canada (2) CERTIS, Ecole Nationale des Ponts, France High-Dimensional


  1. Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , Csaba Szepesvári (1) , Jean-Yves Audibert (2) (1) Department of Computing Science, University of Alberta, Canada (2) CERTIS, Ecole Nationale des Ponts, France

  2. High-Dimensional Data Everywhere • Vision • Sensor Fusion • Feature Expansion • Kernel • ...

  3. Curse of Dimensionality 0 10 ! 1 10 ! 2 10 Mean Squared Error ! 3 10 ! 4 10 D = 1 D = 5 D = 100 ! 5 10 ! 6 10 ! 7 10 0 2 4 6 8 10 10 10 10 10 10 10 Number of Samples

  4. Practical Implications • Thou shall reduce the dimension of the t ! i a data before working with it! W • Thou shall not add features unnecessarily! • Thou shall not accept projects with high- dimensional data! • ... !

  5. Regularities of Data • Smoothness 0.5 • Sparsity 0 1.5 • Low noise at boundary 1 ! 0.5 1.5 0.5 1 0 0.5 0 ! 0.5 ! 0.5 ✓ Lower dimensional submanifold ! 1 ! 1 ! 1.5 ! 1.5 • LLE, IsoMap, Laplacian Eigenmap, Hessian Eigenmap, ... • Semi-supervised Learning, Reinforcement Learning, ...

  6. Goal • Manifold-adaptive machine learning methods • Convergence rate independent of the dimension of the input space

  7. Many open questions! Here: dimension estimation (:

  8. Why? • Needed in various learning methods • Not known a priori

  9. New? • Many existing methods [Pettis et al. (1979) , Kegl (2002), Costa & Hero (2004), Levina & Bickel (2005), Hein & Audibert (2005)] • No rigorous analysis • Asymptotic result [Levina & Bickel (2005)]

  10. Our Contribution • New algorithm • K-NN • Manifold-adaptive convergence rate

  11. General Idea P ( X i ∈ B ( x, r )) = η ( x, r ) r d

  12. P ( X i ∈ B ( x, r )) = η ( x, r ) r d ln ( P ( X i ∈ B ( x, r ))) = ln( η ( x, r )) + d ln( r )

  13. P ( X i ∈ B ( x, r )) = η ( x, r ) r d ln ( P ( X i ∈ B ( x, r ))) = ln( η ( x, r )) + d ln( r )

  14. P ( X i ∈ B ( x, r )) = η ( x, r ) r d ln ( P ( X i ∈ B ( x, r ))) = ln( η ( x, r )) + d ln( r )

  15. P ( X i ∈ B ( x, r )) = η ( x, r ) r d ln ( P ( X i ∈ B ( x, r ))) = ln( η ( x, r )) + d ln( r )

  16. P ( X i ∈ B ( x, r )) = η ( x, r ) r d ln ( P ( X i ∈ B ( x, r ))) = ln( η ( x, r )) + d ln( r ) r k ( x )) ln( k/n ) ≈ ln( η 0 ) + d ln(ˆ r ⌈ k/ 2 ⌉ ( x )) ln( k/ (2 n )) ≈ ln( η 0 ) + d ln(ˆ ln 2 ˆ d ( x ) = r k ( x ) / ˆ r ⌈ k/ 2 ⌉ ( x )) ln(ˆ

  17. Finite Sample Convergence Rate ln 2 ˆ d ( X i ) = r ( k ) ( X i ) / ˆ r ( ⌈ k/ 2 ⌉ ) ( X i )) ln(ˆ Theorem : Under some regularity assumptions on η , provided that n k > Ω (2 d ), with probability at least 1 − δ , � 1 � �� k �� � ln(4 / δ ) d | ˆ d ( X i ) − d | ≤ O d + . n k

  18. Issues ln 2 ˆ d ( X i ) = r ( k ) ( X i ) / ˆ r ( ⌈ k/ 2 ⌉ ) ( X i )) ln(ˆ High variance of ˆ d ( X i ) Ine ffi cient use of data r ≪ 1 = ⇒ k ≪ n

  19. Aggregation � � n 1 • Averaging ˆ ˆ � d avg = d ( X i ) n i =1 • Voting n ˆ I { [ ˆ � d ( X i )] = d ′ } d vote = arg max d ′ i =1 Theorem: c ′ n � � − ( cd k )2 , ˆ d vote � = d e ≤ P c ′′ n � � − ( Dcd k )2 . ˆ d avg � = d e ≤ P

  20. Experiments

  21. Varying the Manifold Dimension S4 S8 1 0.$ Mean Absolute Dimension Estimation Error 0 10 0 ! 0.$ ! 1 1 0.$ 1 0.$ 0 0 ! 0.$ ! 0.$ ! 1 ! 1 ! 1 10 1 2 3 4 10 10 10 10 Number of Samples

  22. Varying Embedding Space Dimension X (D = 3) 0 10 X’ (D = 6) X’’ (D = 12) Mean Absolute Dimension Estimation Errors 0"# 0 1"# 1 ! 0"# 1"# 0"# 1 ! 1 0 0"# 10 0 ! 0"# ! 0"# ! 1 ! 1 ! 1"# ! 1"# 10 100 1000 10000 20000 Number of Samples

  23. Other Datasets Data set n=50 n=100 n=500 n=1000 n=5000 S 1 98 (99) 100 (100) 100 (100) 100 (100) 100 (100) S 3 75 (19) 95 (20) 100 (15) 100 (19) 100 (62) S 5 33 (5) 50 (10) 100 (9) 98 (2) 100 (0) S 7 18 (2) 17 (3) 57 (1) 54 (1) 100 (0) Sinusoid 92 (98) 100 (100) 100 (100) 100 (100) 100 (100) 10-M¨ obius 69 (47) 13 (74) 100 (98) 100 (99) 100 (100) Swiss roll 62 (71) 49 (91) 88 (96) 100 (100 100 (100)

  24. Conclusions and Future Work • New algorithm • Competitive results • Manifold-adaptive convergence rate • Other ML methods? • K-NN regression can! • Penalized least squares in the works • Dimension Reduction?

  25. Questions?

  26. Curse of Dimensionality High-Dimensional Data Increase the complexity of the function space Higher variance with the same number of samples More samples for the same precision

  27. Lower Bound Assume that m n is a regression function that estimate random variable Y based on X and D n = { ( X 1 , Y 1 ) , ..., ( X n , Y n ) } , and m ( X ) = E [ Y | X ]. What is the best possible performance of m n in L 2 sense, i.e. E { � m n ( X ) − m ( X ) � 2 } ? For the class of D ( p,C ) of ( X, Y ) distributions, when X ∈ R D , we have the the following behavior: � � 2 p E { � m n ( X ) − m ( X ) � 2 } > O n − 2 p + D

  28. Two sources of error: • Approximation Error : assuming fixed η ( x, r ) • Estimation Error : estimating P ( X ∈ B ( x, r )) with the empirical estimate k/n . Both of them can be controlled by changing the size of neighborhood r (which is related to k/n ).

  29. Effect of k and n S4 − Averaging S4 − Voting 4 4 10 10 Number of Samples (n) Number of Samples (n) 3 3 10 10 2 2 10 10 1 10 1 10 1 2 3 10 10 10 1 2 3 10 10 10 K K S8 − Averaging S8 − Voting 4 4 10 10 Number of Samples (n) Number of Samples (n) 3 3 10 10 2 2 10 10 1 1 10 10 1 2 3 1 2 3 10 10 10 10 10 10 K K

  30. Experiments Noise Effect 4 10 ! Mobius embedded in R3 10 ! Mobius embedded in R12 3.5 S4 embedded in R5 S4 embedded in R20 3 Mean Absolute Estimation Error 2.5 2 1.5 1 0.5 0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Noise Level (standard deviation)

  31. Effect of Noise 2 1.5 1 0.5 0 ! 0.5 ! 1 ! 1.5 ! 2 ! 2 ! 1.5 ! 1 ! 0.5 0 0.5 1 1.5 2

  32. Effect of Noise 2 1.5 1 0.5 0 ! 0.5 ! 1 ! 1.5 ! 2 ! 2 ! 1.5 ! 1 ! 0.5 0 0.5 1 1.5 2

  33. Effect of Noise 2 1.5 1 0.5 0 ! 0.5 ! 1 ! 1.5 ! 2 ! 2 ! 1.5 ! 1 ! 0.5 0 0.5 1 1.5 2

  34. Effect of Noise 2 1.5 1 0.5 0 ! 0.5 ! 1 ! 1.5 ! 2 ! 2 ! 1.5 ! 1 ! 0.5 0 0.5 1 1.5 2

  35. Exponential Rate Averaging Aggregation 0 10 S4 Probability of Error ! 1 10 ! 2 10 1 2 3 10 10 10 Number of Samples

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend