permutation search methods are efficient yet faster
play

Permutation Search Methods are Efficient, Yet Faster Search is - PowerPoint PPT Presentation

Permutation Search Methods are Efficient, Yet Faster Search is Possible Bileg (Bilegsaikhan) Naidan 1 Leo (Leonid) Boytsov 2 Eric Nyberg 2 1 Norwegian University of Science and Technology (NTNU) 2 Carnegie Mellon University (CMU)


  1. Permutation Search Methods are Efficient, Yet Faster Search is Possible Bileg (Bilegsaikhan) Naidan 1 Leo (Leonid) Boytsov 2 Eric Nyberg 2 1 Norwegian University of Science and Technology (NTNU) 2 Carnegie Mellon University (CMU) https://github.com/searchivarius/NonMetricSpaceLib

  2. Nearest-neighbor search (NN-search) 1/ 17 4/9/15

  3. Nearest-neighbor search (NN-search) • Input: A set of n objects and a distance function d ( x , y ) 1/ 17 4/9/15

  4. Nearest-neighbor search (NN-search) • Input: A set of n objects and a distance function d ( x , y ) • Query: New object q and k 1/ 17 4/9/15

  5. Nearest-neighbor search (NN-search) • Input: A set of n objects and a distance function d ( x , y ) • Query: New object q and k • Task: Quickly fi nd k most similar objects in the database to q 2 1 3 Query q k = 3 q 1/ 17 4/9/15

  6. Distance function d ( x , y ) Name Symmetry Triangle ineq. �� ( x i − y i ) 2 Euclidean ( L 2 ) 1 − x · y Cosine distance | x || y | � x i log x i KL-diverg. y i JS-diverg. symmetrized & smoothed KL-diverg. Distance functions can be metric or non-metric 2/ 17 4/9/15

  7. How to fi nd similar objects? 3/ 17 4/9/15

  8. How to fi nd similar objects? • Brute-force • Exact search • Slow: n distance computations 3/ 17 4/9/15

  9. How to fi nd similar objects? • Brute-force • Exact search • Slow: n distance computations • I ndexing • Exact search is mostly slow in high-dimensions and/or non-metric spaces: O ( n ) distance computations • Approximate search can be fast 3/ 17 4/9/15

  10. State-of-the-art approximate search methods • Locality Sensitivity Hashing (LSH) • VP-tree/ball-tree (data-dependent tuning) • Proximity graphs (kNN-graphs) • Permutation methods 4/ 17 4/9/15

  11. Why should we care about permutation methods? 5/ 17 4/9/15

  12. Why should we care about permutation methods? • Promising universal methods for non-metric spaces 5/ 17 4/9/15

  13. Why should we care about permutation methods? • Promising universal methods for non-metric spaces • Mapping data from “hard ” spaces to “ easy ” spaces (the Euclidean space) 5/ 17 4/9/15

  14. Why should we care about permutation methods? • Promising universal methods for non-metric spaces • Mapping data from “hard ” spaces to “ easy ” spaces (the Euclidean space) • Database-friendly methods that are easy to implement on top of a database system or Lucene 5/ 17 4/9/15

  15. Research questions 6/ 17 4/9/15

  16. Research questions • How good are permutation-based projections? 6/ 17 4/9/15

  17. Research questions • How good are permutation-based projections? • How well do permutation methods fare against state of the art? 6/ 17 4/9/15

  18. Permutation Methods • Filter-and-re fi ne methods using pivot-based projection to the permutation space ( L 1 or L 2 ) 7/ 17 4/9/15

  19. Permutation Methods • Filter-and-re fi ne methods using pivot-based projection to the permutation space ( L 1 or L 2 ) • Select randomly a set of reference points called pivots 7/ 17 4/9/15

  20. Permutation Methods • Filter-and-re fi ne methods using pivot-based projection to the permutation space ( L 1 or L 2 ) • Select randomly a set of reference points called pivots • Order pivots by their distances to data points to obtain pivot rankings, which we call permutations 7/ 17 4/9/15

  21. Permutation Methods • Filter-and-re fi ne methods using pivot-based projection to the permutation space ( L 1 or L 2 ) • Select randomly a set of reference points called pivots • Order pivots by their distances to data points to obtain pivot rankings, which we call permutations • Filter by comparing permutations to obtain candidate points 7/ 17 4/9/15

  22. Permutation Methods • Filter-and-re fi ne methods using pivot-based projection to the permutation space ( L 1 or L 2 ) • Select randomly a set of reference points called pivots • Order pivots by their distances to data points to obtain pivot rankings, which we call permutations • Filter by comparing permutations to obtain candidate points • Re fi ne by comparing candidate points to the query 7/ 17 4/9/15

  23. Permutation Methods How do we carry out the fi ltering step? 8/ 17 4/9/15

  24. Permutation Methods How do we carry out the fi ltering step? • Brute force searching 8/ 17 4/9/15

  25. Permutation Methods How do we carry out the fi ltering step? • Brute force searching • Indexing of permutations 8/ 17 4/9/15

  26. Permutation Methods How do we carry out the fi ltering step? • Brute force searching • Indexing of permutations • Neighborhood APProximation Index (NAPP) is the best approach 8/ 17 4/9/15

  27. Experiments: Datasets Name Distance Number Brute-force Dimens. function of points (sec.) Metric Data 5 · 10 6 CoPhIR L 2 0.6 282 5 · 10 6 SIFT L 2 0.3 128 1 · 10 6 ImageNet SQFD 4.1 N/A Non-Metric Data 4 · 10 6 10 5 Wiki-sparse Cosine sim. 1.9 2 · 10 6 Wiki-8 KL-div/JS-div 0.045/0.28 8 2 · 10 6 Wiki-128 KL-div/JS-div 0.22/ 4 128 1 · 10 6 DNA Norm. Leven. 3.5 N/A 9/ 17 4/9/15

  28. Experiments: Projection Quality Distance in the original space vs. distance in the projected space. The closer to a monotonic mapping, the better : 300 200 100 0 0 200 400 600 Good projection (original distance: L 2 ) 10/ 17 4/9/15

  29. Experiments: Projection Quality Distance in the original space vs. distance in the projected space. The closer to a monotonic mapping, the better : 250 200 150 100 50 0 0.0 0.2 0.4 0.6 Bad projection (original distance: JS-div.) 11/ 17 4/9/15

  30. Experiments: E ffi ciency vs Accuracy Improvement in e ffi ciency over brute-force search vs. accuracy. Higher and to the right is better : SIFT ( L 2 ) Improv. in efficiency (log. scale) 10 2 VP-tree MPLSH 10 1 kNN-graph (SW) NAPP 0 . 6 0 . 7 0 . 8 0 . 9 1 Recall 12/ 17 4/9/15

  31. Experiments: E ffi ciency vs Accuracy Improvement in e ffi ciency over brute-force search vs. accuracy. Higher and to the right is better : Norm. Levenshtein VP-tree Improv. in efficiency (log. scale) 10 2 kNN-graph (NN-desc) brute-force filt. bin. NAPP 10 1 0 . 6 0 . 7 0 . 8 0 . 9 1 Recall 13/ 17 4/9/15

  32. Conclusions • Permutation methods beat state-of-the-art methods (VP-trees, kNN-graphs and Multiprobe LSH) for some data sets , in particular, when the distance function is expensive 14/ 17 4/9/15

  33. Conclusions • Permutation methods beat state-of-the-art methods (VP-trees, kNN-graphs and Multiprobe LSH) for some data sets , in particular, when the distance function is expensive • The quality of permutation-based projection can be both good and poor : it appears to be better when the space is metric and/or dimensionality is low 14/ 17 4/9/15

  34. Poster Session Discussion Points What makes a good, amenable, non-metric space? 15/ 17 4/9/15

  35. Thank you for your attention! 16/ 17 4/9/15

  36. Some technical details

  37. Permutation Methods The data points are a , b , c , d in 2-dim. Euclidean space ( L 2 ). The Voronoi diagram produced by 4 pivots π i . Point Pivot Order Permutations � 2 Similar ( π 1 , π 2 , π 3 , π 4 ) ( 1 , 2 , 3 , 4 ) � 1 a a b d ( π 1 , π 2 , π 4 , π 3 ) ( 1 , 2 , 4 , 3 ) b c � 4 ( π 3 , π 1 , π 2 , π 4 ) ( 2 , 3 , 1 , 4 ) c � 3 ( π 4 , π 2 , π 1 , π 3 ) ( 3 , 2 , 4 , 1 ) d Position of π 4 is 1

  38. Permutation Methods Permutation is a fancy word for a pivot ranking! The data points are a , b , c , d in 2-dim. Euclidean space ( L 2 ). The Voronoi diagram produced by 4 pivots π i . Point Pivot Order Permutations � 2 Similar ( π 1 , π 2 , π 3 , π 4 ) ( 1 , 2 , 3 , 4 ) � 1 a a b d ( π 1 , π 2 , π 4 , π 3 ) ( 1 , 2 , 4 , 3 ) b c � 4 ( π 3 , π 1 , π 2 , π 4 ) ( 2 , 3 , 1 , 4 ) c � 3 ( π 4 , π 2 , π 1 , π 3 ) ( 3 , 2 , 4 , 1 ) d Position of π 4 is 1

  39. Permutation Methods • Filtering step - compare permutations instead of original data points to obtain γ candidate points • Footrule distance ( x , y ) = � i | x i − y i | (same as L 1 ) • Spearman’s rho distance (same as L 2 ) Footrule( a , • ) candidate points Point � 2 | 1 − 1 | + | 2 − 2 | + | 3 − 4 | + | 4 − 3 | = 2 � 1 b a b d | 1 − 2 | + | 2 − 3 | + | 3 − 1 | + | 4 − 4 | = 4 c c � 4 | 1 − 3 | + | 2 − 2 | + | 3 − 4 | + | 4 − 1 | = 6 d � 3 • Re fi nement step - apply d ( q , • ) for the candidate points (in our example, γ = 2, q = a , d ( q , b ) and d ( q , c ) )

  40. Permutation Methods Filtering step: • Naive approach - Brute force searching • using a priority queue • incremental sorting [ Gonzales 2008 ] ( × 2 faster than the priority queue approach) • binarized permutations (select a threshold b and use the Hamming distance) • Brute force in the permutation space is e ffi cient if the distance is expensive.

  41. Permutation Methods To reduce the cost of the fi ltering stage , three types of indices were proposed: • use the existing methods for metric spaces [ Figueroa 2009 ] • the Permutation Pre fi x Index (PP-Index) [ Esuli 2009 ] • the Metric Inverted File (MI- fi le) [ Amato et al. 2008 ]

  42. Permutation Methods Permutation Pre fi x I ndex (PP-index) [ Esuli 2009 ] 1 4 Point Pivot Order 3 ( π 1 , π 2 , π 3 , π 4 ) a 2 1 2 ( π 1 , π 2 , π 4 , π 3 ) b ( π 3 , π 1 , π 2 , π 4 ) 3 4 2 1 c ( π 4 , π 2 , π 1 , π 3 ) d a c b d

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend