 
              Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.15/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.16/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.17/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.18/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.19/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.20/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.21/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.22/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.23/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.24/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.25/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.26/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.27/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.28/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.29/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.30/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.31/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.32/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.33/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.34/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.35/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.36/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.37/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.38/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.39/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.40/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.41/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.42/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.43/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.44/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.45/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.46/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.47/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.48/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.49/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.50/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.51/77
Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.52/77
Ex: Two-point Correlation Gray and Moore, NIPS 2000 � � I ( d ( x 1 , x 2 ) ≤ h ) x 1 ∈ X x 2 ∈ X function tpc( X 1 , X 2 ) if d l ( X 1 , X 2 ) > h, return 0 if d u ( X 1 , X 2 ) ≤ h, return | X 1 | · | X 2 | return tpc( X L 1 , X L 2 ) + tpc( X L 1 , X R 2 ) + tpc( X R 1 , X L 2 ) + tpc( X R 1 , X R 2 ) Dual-tree Algorithms in Statistics – p.53/77
Ex: Range Count Gray and Moore, NIPS 2000 � map I ( d ( q, r ) ≤ h ) q ∈ Q r ∈ R init ∀ q ∈ Q root , a ( q ) = 0 function rng( Q, R ) if d l ( Q, R ) > h, return if d u ( Q, R ) ≤ h, ∀ q ∈ Q, a ( q ) += | R | ; return L , R L ); rng( Q L , R R ) rng( Q R , R L ); rng( Q R , R R ) rng( Q Dual-tree Algorithms in Statistics – p.54/77
Ex: All-nearest-neighbors Gray and Moore, NIPS 2000 map argmin d ( q, r ) q ∈ Q r ∈ R init ∀ q ∈ Q root , a ( q ) = ∞ function allnn( Q, R ) if a u ( Q ) ≤ d l ( Q, R ) , return if ( Q, R ) = ( { q } , { r } ) , a ( q ) = min { a ( q ) , d ( q, r ) } ; return prioritize { R 1 , R 2 } = { R L , R R } by d l ( Q L , · ) L , R 1 ); allnn( Q L , R 2 ) allnn( Q prioritize { R 1 , R 2 } = { R L , R R } by d l ( Q R , · ) R , R 1 ); allnn( Q R , R 2 ) allnn( Q Dual-tree Algorithms in Statistics – p.55/77
Ex: Kernel Density Estimation Lee et al. , NIPS 2005 Lee and Gray, UAI 2006 � map K h ( q, r ) q ∈ Q r ∈ R init ∀ q ∈ Q root , a ( q ) = 0; b = 0 function kde( Q, R, b ) h ( Q, R ) < ( a l ( Q ) + b ) | R |· ǫ if K u h ( Q, R ) − K l | R root | , ∀ q ∈ Q, a ( q ) += K l h ( Q, R ); return prioritize { R 1 , R 2 } = { R L , R R } by d l ( Q L , · ) L , R 1 , b + K l L , R 2 )); kde( Q L , R 2 , b ) kde( Q h ( Q prioritize { R 1 , R 2 } = { R L , R R } by d l ( Q R , · ) R , R 1 , b + K l R , R 2 )); kde( Q R , R 2 , b ) kde( Q h ( Q Dual-tree Algorithms in Statistics – p.56/77
Ex: Kernel Discriminant Analysis Gray and Riegel, COMPSTAT 2006 Riegel et al. , SIAM Data Mining 2008 P ( C ) � map argmax K h C ( q, r ) | R C | q ∈ Q C ∈{ C 1 ,C 2 } r ∈ R C init ∀ q ∈ Q root , a ( q ) = δ ( Q root , R root ) enqueue( Q root , R root ) while dequeue( Q, R ) // Main loop of kda if a l ( Q ) > 0 or a u ( Q ) < 0 , return ∀ q ∈ Q, a ( q ) − = δ ( Q, R ) L , a ( q ) += δ ( Q L , R L ) + δ ( Q L , R R ) ∀ q ∈ Q R , a ( q ) += δ ( Q R , R L ) + δ ( Q R , R R ) ∀ q ∈ Q L , R L ); enqueue( Q L , R R ) enqueue( Q R , R L ); enqueue( Q R , R R ) enqueue( Q Dual-tree Algorithms in Statistics – p.57/77
Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Dual-tree Algorithms in Statistics – p.58/77
Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Dual-tree Algorithms in Statistics – p.58/77
Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Dual-tree Algorithms in Statistics – p.58/77
Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Believed to be active galactic nuclei: giant black holes Dual-tree Algorithms in Statistics – p.58/77
Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Believed to be active galactic nuclei: giant black holes Implications for dark matter, dark energy, etc. Dual-tree Algorithms in Statistics – p.58/77
Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Believed to be active galactic nuclei: giant black holes Implications for dark matter, dark energy, etc. Peplow, Nature 2005 uses one of our catalogs to verify the cosmic magnification effect predicted by relativity Dual-tree Algorithms in Statistics – p.58/77
Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Dual-tree Algorithms in Statistics – p.59/77
Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Took 640 seconds in serial; half of that was tree-building. Dual-tree Algorithms in Statistics – p.59/77
Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Took 640 seconds in serial; half of that was tree-building. Naïve’s takes 380 hours, excluding bandwidth learning. Dual-tree Algorithms in Statistics – p.59/77
Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Took 640 seconds in serial; half of that was tree-building. Naïve’s takes 380 hours, excluding bandwidth learning. Algorithmic parameters are key to performance: Hybrid breadth-depth first expansion Epanechnikov kernel (choice of f ) to maximize pruning Multi-bandwidth algorithm for faster bandwidth fitting Dual-tree Algorithms in Statistics – p.59/77
Case Study: Quasar Identification LOO CV on 4D Quasar Data 4 10 Naive Heap Heap, Epan 3 Hybrid 10 Hybrid, Epan 2 10 Running Time 1 10 0 10 −1 10 −2 10 3 4 5 6 10 10 10 10 Data Set Size Dual-tree Algorithms in Statistics – p.60/77
GNPs, Formally Speaking Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Higher-order reduce problem Ψ = g ◦ ψ , with � � ψ ( X 1 , . . . , X n ) = · · · f ( x 1 , . . . , x n ) 1 n x 1 ∈ X 1 x n ∈ X n Dual-tree Algorithms in Statistics – p.61/77
GNPs, Formally Speaking Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Higher-order reduce problem Ψ = g ◦ ψ , with � � ψ ( X 1 , . . . , X n ) = · · · f ( x 1 , . . . , x n ) 1 n x 1 ∈ X 1 x n ∈ X n subject to decomposability requirement ψ ( . . . , X i , . . . ) = ψ ( . . . , X L i ψ ( . . . , X R i , . . . ) ⊗ i , . . . ) for all 1 ≤ i ≤ n and partitions X L i ∪ X R i = X i . Dual-tree Algorithms in Statistics – p.61/77
GNPs, Formally Speaking Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Higher-order reduce problem Ψ = g ◦ ψ , with � � ψ ( X 1 , . . . , X n ) = · · · f ( x 1 , . . . , x n ) 1 n x 1 ∈ X 1 x n ∈ X n subject to decomposability requirement ψ ( . . . , X i , . . . ) = ψ ( . . . , X L i ψ ( . . . , X R i , . . . ) ⊗ i , . . . ) for all 1 ≤ i ≤ n and partitions X L i ∪ X R i = X i . We’ll also need some means of bounding the results of ψ . Dual-tree Algorithms in Statistics – p.61/77
Decomposability (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Decomposability is restrictive; always possible for problems formed by combinations of map and some one other ⊗ . Dual-tree Algorithms in Statistics – p.62/77
Decomposability (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Decomposability is restrictive; always possible for problems formed by combinations of map and some one other ⊗ . It is equivalent to � � � � · · · f ( x 1 , · · · , x n ) = · · · f ( x 1 , · · · , x n ) 1 n p 1 p n x 1 ∈ X 1 x n ∈ X n x p 1 ∈ X p 1 x pn ∈ X pn for all permutations p of the set { 1 , . . . , n } , Dual-tree Algorithms in Statistics – p.62/77
Decomposability (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Decomposability is restrictive; always possible for problems formed by combinations of map and some one other ⊗ . It is equivalent to � � � � · · · f ( x 1 , · · · , x n ) = · · · f ( x 1 , · · · , x n ) 1 n p 1 p n x 1 ∈ X 1 x n ∈ X n x p 1 ∈ X p 1 x pn ∈ X pn for all permutations p of the set { 1 , . . . , n } , and to ( ψ ( X L i , X L i ψ ( X R i , X L j ( ψ ( X L i , X R i ψ ( X R i , X R j ) ⊗ j )) ⊗ j ) ⊗ j )) = ( ψ ( X L i , X L j ψ ( X L i , X R i ( ψ ( X R i , X L j ψ ( X R i , X R j ) ⊗ j )) ⊗ j ) ⊗ j )) Dual-tree Algorithms in Statistics – p.62/77
Decomposability � � ψ ( X, Y ) = f ( x, y ) x ∈ X y ∈ Y ( f ( x 1 , y 1 ) ⊗ f ( x 1 , y 2 ) ⊗ · · · ⊗ f ( x 1 , y M ) ) ⊙ ( f ( x 2 , y 1 ) ⊗ f ( x 2 , y 2 ) ⊗ · · · ⊗ f ( x 2 , y M ) ) ⊙ . . . ⊙ ( f ( x N , y 1 ) ⊗ f ( x N , y 2 ) ⊗ · · · ⊗ f ( x N , y M ) ) Dual-tree Algorithms in Statistics – p.63/77
Decomposability ψ ( X, Y ) = ψ ( X, Y L ) ⊗ ψ ( X, Y R )     f ( x 1 , y 1 ) ( f ( x 1 , y 2 ) ⊗ · · · ⊗ f ( x 1 , y M ) ) ⊙ ⊙             f ( x 2 , y 1 ) ( f ( x 2 , y 2 ) ⊗ · · · ⊗ f ( x 2 , y M ) )         ⊙ ⊙ ⊗         . . . .     . .         ⊙ ⊙         f ( x N , y 1 ) ( f ( x N , y 2 ) ⊗ · · · ⊗ f ( x N , y M ) ) Dual-tree Algorithms in Statistics – p.64/77
Transforming Problems into GNPs (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 (“Serial” GNPs.) Decomposable or not, � � � � � � � �� · · · g n f ( x 1 , . . . , x n ) · · · g 1 g 2 1 2 n x 1 ∈ X 1 x 2 ∈ X 2 x n ∈ X n may be transformed into nested GNPs by replacing every other operator with map and factoring intermediate g i out. Dual-tree Algorithms in Statistics – p.65/77
Transforming Problems into GNPs (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 (“Serial” GNPs.) Decomposable or not, � � � � � � � �� · · · g n f ( x 1 , . . . , x n ) · · · g 1 g 2 1 2 n x 1 ∈ X 1 x 2 ∈ X 2 x n ∈ X n may be transformed into nested GNPs by replacing every other operator with map and factoring intermediate g i out. (“Parallel” GNPs.) Also GNP-able are problems such as: � j w ij K ( x i , x j ) map � j K ( x i , x j ) i Dual-tree Algorithms in Statistics – p.65/77
Transforming Problems into GNPs (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 (“Serial” GNPs.) Decomposable or not, � � � � � � � �� · · · g n f ( x 1 , . . . , x n ) · · · g 1 g 2 1 2 n x 1 ∈ X 1 x 2 ∈ X 2 x n ∈ X n may be transformed into nested GNPs by replacing every other operator with map and factoring intermediate g i out. (“Parallel” GNPs.) Also GNP-able are problems such as: � j w ij K ( x i , x j ) map � j K ( x i , x j ) i (“Multi” GNPs.) Wrap problem with map to vary parameter. Dual-tree Algorithms in Statistics – p.65/77
The Algorithm Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al. , ICML 2008 or JMLR 2008 “One algorithm to solve them all”: ψ ( X 1 , . . . , X n )  if bounds prove it is safe to prune to a, a   ← f ( x 1 , . . . , x n ) if each X i = { x i } , i.e. is leaf, ψ ( . . . , X L i ψ ( . . . , X R  otherwise i , . . . ) ⊗ i , . . . )  Dual-tree Algorithms in Statistics – p.66/77
Recommend
More recommend