dual tree algorithms in statistics
play

Dual-tree Algorithms in Statistics Ryan Riegel - PowerPoint PPT Presentation

Dual-tree Algorithms in Statistics Ryan Riegel rriegel@cc.gatech.edu Computational Science and Engineering College of Computing Georgia Institute of Technology Dual-tree Algorithms in Statistics p.1/77 Outline (Relevant citations at top


  1. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.15/77

  2. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.16/77

  3. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.17/77

  4. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.18/77

  5. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.19/77

  6. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.20/77

  7. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.21/77

  8. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.22/77

  9. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.23/77

  10. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.24/77

  11. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.25/77

  12. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.26/77

  13. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.27/77

  14. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.28/77

  15. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.29/77

  16. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.30/77

  17. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.31/77

  18. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.32/77

  19. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.33/77

  20. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.34/77

  21. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.35/77

  22. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.36/77

  23. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.37/77

  24. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.38/77

  25. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.39/77

  26. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.40/77

  27. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.41/77

  28. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.42/77

  29. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.43/77

  30. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.44/77

  31. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.45/77

  32. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.46/77

  33. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.47/77

  34. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.48/77

  35. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.49/77

  36. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.50/77

  37. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.51/77

  38. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.52/77

  39. Ex: Two-point Correlation Gray and Moore, NIPS 2000 � � I ( d ( x 1 , x 2 ) ≤ h ) x 1 ∈ X x 2 ∈ X function tpc( X 1 , X 2 ) if d l ( X 1 , X 2 ) > h, return 0 if d u ( X 1 , X 2 ) ≤ h, return | X 1 | · | X 2 | return tpc( X L 1 , X L 2 ) + tpc( X L 1 , X R 2 ) + tpc( X R 1 , X L 2 ) + tpc( X R 1 , X R 2 ) Dual-tree Algorithms in Statistics – p.53/77

  40. Ex: Range Count Gray and Moore, NIPS 2000 � map I ( d ( q, r ) ≤ h ) q ∈ Q r ∈ R init ∀ q ∈ Q root , a ( q ) = 0 function rng( Q, R ) if d l ( Q, R ) > h, return if d u ( Q, R ) ≤ h, ∀ q ∈ Q, a ( q ) += | R | ; return L , R L ); rng( Q L , R R ) rng( Q R , R L ); rng( Q R , R R ) rng( Q Dual-tree Algorithms in Statistics – p.54/77

  41. Ex: All-nearest-neighbors Gray and Moore, NIPS 2000 map argmin d ( q, r ) q ∈ Q r ∈ R init ∀ q ∈ Q root , a ( q ) = ∞ function allnn( Q, R ) if a u ( Q ) ≤ d l ( Q, R ) , return if ( Q, R ) = ( { q } , { r } ) , a ( q ) = min { a ( q ) , d ( q, r ) } ; return prioritize { R 1 , R 2 } = { R L , R R } by d l ( Q L , · ) L , R 1 ); allnn( Q L , R 2 ) allnn( Q prioritize { R 1 , R 2 } = { R L , R R } by d l ( Q R , · ) R , R 1 ); allnn( Q R , R 2 ) allnn( Q Dual-tree Algorithms in Statistics – p.55/77

  42. Ex: Kernel Density Estimation Lee et al. , NIPS 2005 Lee and Gray, UAI 2006 � map K h ( q, r ) q ∈ Q r ∈ R init ∀ q ∈ Q root , a ( q ) = 0; b = 0 function kde( Q, R, b ) h ( Q, R ) < ( a l ( Q ) + b ) | R |· ǫ if K u h ( Q, R ) − K l | R root | , ∀ q ∈ Q, a ( q ) += K l h ( Q, R ); return prioritize { R 1 , R 2 } = { R L , R R } by d l ( Q L , · ) L , R 1 , b + K l L , R 2 )); kde( Q L , R 2 , b ) kde( Q h ( Q prioritize { R 1 , R 2 } = { R L , R R } by d l ( Q R , · ) R , R 1 , b + K l R , R 2 )); kde( Q R , R 2 , b ) kde( Q h ( Q Dual-tree Algorithms in Statistics – p.56/77

  43. Ex: Kernel Discriminant Analysis Gray and Riegel, COMPSTAT 2006 Riegel et al. , SIAM Data Mining 2008 P ( C ) � map argmax K h C ( q, r ) | R C | q ∈ Q C ∈{ C 1 ,C 2 } r ∈ R C init ∀ q ∈ Q root , a ( q ) = δ ( Q root , R root ) enqueue( Q root , R root ) while dequeue( Q, R ) // Main loop of kda if a l ( Q ) > 0 or a u ( Q ) < 0 , return ∀ q ∈ Q, a ( q ) − = δ ( Q, R ) L , a ( q ) += δ ( Q L , R L ) + δ ( Q L , R R ) ∀ q ∈ Q R , a ( q ) += δ ( Q R , R L ) + δ ( Q R , R R ) ∀ q ∈ Q L , R L ); enqueue( Q L , R R ) enqueue( Q R , R L ); enqueue( Q R , R R ) enqueue( Q Dual-tree Algorithms in Statistics – p.57/77

  44. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Dual-tree Algorithms in Statistics – p.58/77

  45. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Dual-tree Algorithms in Statistics – p.58/77

  46. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Dual-tree Algorithms in Statistics – p.58/77

  47. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Believed to be active galactic nuclei: giant black holes Dual-tree Algorithms in Statistics – p.58/77

  48. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Believed to be active galactic nuclei: giant black holes Implications for dark matter, dark energy, etc. Dual-tree Algorithms in Statistics – p.58/77

  49. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Believed to be active galactic nuclei: giant black holes Implications for dark matter, dark energy, etc. Peplow, Nature 2005 uses one of our catalogs to verify the cosmic magnification effect predicted by relativity Dual-tree Algorithms in Statistics – p.58/77

  50. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Dual-tree Algorithms in Statistics – p.59/77

  51. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Took 640 seconds in serial; half of that was tree-building. Dual-tree Algorithms in Statistics – p.59/77

  52. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Took 640 seconds in serial; half of that was tree-building. Naïve’s takes 380 hours, excluding bandwidth learning. Dual-tree Algorithms in Statistics – p.59/77

  53. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Took 640 seconds in serial; half of that was tree-building. Naïve’s takes 380 hours, excluding bandwidth learning. Algorithmic parameters are key to performance: Hybrid breadth-depth first expansion Epanechnikov kernel (choice of f ) to maximize pruning Multi-bandwidth algorithm for faster bandwidth fitting Dual-tree Algorithms in Statistics – p.59/77

  54. Case Study: Quasar Identification LOO CV on 4D Quasar Data 4 10 Naive Heap Heap, Epan 3 Hybrid 10 Hybrid, Epan 2 10 Running Time 1 10 0 10 −1 10 −2 10 3 4 5 6 10 10 10 10 Data Set Size Dual-tree Algorithms in Statistics – p.60/77

  55. GNPs, Formally Speaking Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Higher-order reduce problem Ψ = g ◦ ψ , with � � ψ ( X 1 , . . . , X n ) = · · · f ( x 1 , . . . , x n ) 1 n x 1 ∈ X 1 x n ∈ X n Dual-tree Algorithms in Statistics – p.61/77

  56. GNPs, Formally Speaking Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Higher-order reduce problem Ψ = g ◦ ψ , with � � ψ ( X 1 , . . . , X n ) = · · · f ( x 1 , . . . , x n ) 1 n x 1 ∈ X 1 x n ∈ X n subject to decomposability requirement ψ ( . . . , X i , . . . ) = ψ ( . . . , X L i ψ ( . . . , X R i , . . . ) ⊗ i , . . . ) for all 1 ≤ i ≤ n and partitions X L i ∪ X R i = X i . Dual-tree Algorithms in Statistics – p.61/77

  57. GNPs, Formally Speaking Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Higher-order reduce problem Ψ = g ◦ ψ , with � � ψ ( X 1 , . . . , X n ) = · · · f ( x 1 , . . . , x n ) 1 n x 1 ∈ X 1 x n ∈ X n subject to decomposability requirement ψ ( . . . , X i , . . . ) = ψ ( . . . , X L i ψ ( . . . , X R i , . . . ) ⊗ i , . . . ) for all 1 ≤ i ≤ n and partitions X L i ∪ X R i = X i . We’ll also need some means of bounding the results of ψ . Dual-tree Algorithms in Statistics – p.61/77

  58. Decomposability (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Decomposability is restrictive; always possible for problems formed by combinations of map and some one other ⊗ . Dual-tree Algorithms in Statistics – p.62/77

  59. Decomposability (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Decomposability is restrictive; always possible for problems formed by combinations of map and some one other ⊗ . It is equivalent to � � � � · · · f ( x 1 , · · · , x n ) = · · · f ( x 1 , · · · , x n ) 1 n p 1 p n x 1 ∈ X 1 x n ∈ X n x p 1 ∈ X p 1 x pn ∈ X pn for all permutations p of the set { 1 , . . . , n } , Dual-tree Algorithms in Statistics – p.62/77

  60. Decomposability (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Decomposability is restrictive; always possible for problems formed by combinations of map and some one other ⊗ . It is equivalent to � � � � · · · f ( x 1 , · · · , x n ) = · · · f ( x 1 , · · · , x n ) 1 n p 1 p n x 1 ∈ X 1 x n ∈ X n x p 1 ∈ X p 1 x pn ∈ X pn for all permutations p of the set { 1 , . . . , n } , and to ( ψ ( X L i , X L i ψ ( X R i , X L j ( ψ ( X L i , X R i ψ ( X R i , X R j ) ⊗ j )) ⊗ j ) ⊗ j )) = ( ψ ( X L i , X L j ψ ( X L i , X R i ( ψ ( X R i , X L j ψ ( X R i , X R j ) ⊗ j )) ⊗ j ) ⊗ j )) Dual-tree Algorithms in Statistics – p.62/77

  61. Decomposability � � ψ ( X, Y ) = f ( x, y ) x ∈ X y ∈ Y ( f ( x 1 , y 1 ) ⊗ f ( x 1 , y 2 ) ⊗ · · · ⊗ f ( x 1 , y M ) ) ⊙ ( f ( x 2 , y 1 ) ⊗ f ( x 2 , y 2 ) ⊗ · · · ⊗ f ( x 2 , y M ) ) ⊙ . . . ⊙ ( f ( x N , y 1 ) ⊗ f ( x N , y 2 ) ⊗ · · · ⊗ f ( x N , y M ) ) Dual-tree Algorithms in Statistics – p.63/77

  62. Decomposability ψ ( X, Y ) = ψ ( X, Y L ) ⊗ ψ ( X, Y R )     f ( x 1 , y 1 ) ( f ( x 1 , y 2 ) ⊗ · · · ⊗ f ( x 1 , y M ) ) ⊙ ⊙             f ( x 2 , y 1 ) ( f ( x 2 , y 2 ) ⊗ · · · ⊗ f ( x 2 , y M ) )         ⊙ ⊙ ⊗         . . . .     . .         ⊙ ⊙         f ( x N , y 1 ) ( f ( x N , y 2 ) ⊗ · · · ⊗ f ( x N , y M ) ) Dual-tree Algorithms in Statistics – p.64/77

  63. Transforming Problems into GNPs (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 (“Serial” GNPs.) Decomposable or not, � � � � � � � �� · · · g n f ( x 1 , . . . , x n ) · · · g 1 g 2 1 2 n x 1 ∈ X 1 x 2 ∈ X 2 x n ∈ X n may be transformed into nested GNPs by replacing every other operator with map and factoring intermediate g i out. Dual-tree Algorithms in Statistics – p.65/77

  64. Transforming Problems into GNPs (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 (“Serial” GNPs.) Decomposable or not, � � � � � � � �� · · · g n f ( x 1 , . . . , x n ) · · · g 1 g 2 1 2 n x 1 ∈ X 1 x 2 ∈ X 2 x n ∈ X n may be transformed into nested GNPs by replacing every other operator with map and factoring intermediate g i out. (“Parallel” GNPs.) Also GNP-able are problems such as: � j w ij K ( x i , x j ) map � j K ( x i , x j ) i Dual-tree Algorithms in Statistics – p.65/77

  65. Transforming Problems into GNPs (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 (“Serial” GNPs.) Decomposable or not, � � � � � � � �� · · · g n f ( x 1 , . . . , x n ) · · · g 1 g 2 1 2 n x 1 ∈ X 1 x 2 ∈ X 2 x n ∈ X n may be transformed into nested GNPs by replacing every other operator with map and factoring intermediate g i out. (“Parallel” GNPs.) Also GNP-able are problems such as: � j w ij K ( x i , x j ) map � j K ( x i , x j ) i (“Multi” GNPs.) Wrap problem with map to vary parameter. Dual-tree Algorithms in Statistics – p.65/77

  66. The Algorithm Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al. , ICML 2008 or JMLR 2008 “One algorithm to solve them all”: ψ ( X 1 , . . . , X n )  if bounds prove it is safe to prune to a, a   ← f ( x 1 , . . . , x n ) if each X i = { x i } , i.e. is leaf, ψ ( . . . , X L i ψ ( . . . , X R  otherwise i , . . . ) ⊗ i , . . . )  Dual-tree Algorithms in Statistics – p.66/77

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend