Permutation Search Methods are Efficient, Yet Faster Search is - PowerPoint PPT Presentation

Permutation Search Methods are Efficient, Yet Faster Search is Possible Bileg (Bilegsaikhan) Naidan 1 Leo (Leonid) Boytsov 2 Eric Nyberg 2 1 Norwegian University of Science and Technology (NTNU) 2 Carnegie Mellon University (CMU) https://github.com/searchivarius/NonMetricSpaceLib

Nearest-neighbor search (NN-search) 1/ 17 4/9/15

Nearest-neighbor search (NN-search) • Input: A set of n objects and a distance function d ( x , y ) 1/ 17 4/9/15

Nearest-neighbor search (NN-search) • Input: A set of n objects and a distance function d ( x , y ) • Query: New object q and k 1/ 17 4/9/15

Nearest-neighbor search (NN-search) • Input: A set of n objects and a distance function d ( x , y ) • Query: New object q and k • Task: Quickly fi nd k most similar objects in the database to q 2 1 3 Query q k = 3 q 1/ 17 4/9/15

Distance function d ( x , y ) Name Symmetry Triangle ineq. �� ( x i − y i ) 2 Euclidean ( L 2 ) 1 − x · y Cosine distance | x || y | � x i log x i KL-diverg. y i JS-diverg. symmetrized & smoothed KL-diverg. Distance functions can be metric or non-metric 2/ 17 4/9/15

How to fi nd similar objects? 3/ 17 4/9/15

How to fi nd similar objects? • Brute-force • Exact search • Slow: n distance computations 3/ 17 4/9/15

How to fi nd similar objects? • Brute-force • Exact search • Slow: n distance computations • I ndexing • Exact search is mostly slow in high-dimensions and/or non-metric spaces: O ( n ) distance computations • Approximate search can be fast 3/ 17 4/9/15

State-of-the-art approximate search methods • Locality Sensitivity Hashing (LSH) • VP-tree/ball-tree (data-dependent tuning) • Proximity graphs (kNN-graphs) • Permutation methods 4/ 17 4/9/15

Why should we care about permutation methods? 5/ 17 4/9/15

Why should we care about permutation methods? • Promising universal methods for non-metric spaces 5/ 17 4/9/15

Why should we care about permutation methods? • Promising universal methods for non-metric spaces • Mapping data from “hard ” spaces to “ easy ” spaces (the Euclidean space) 5/ 17 4/9/15

Why should we care about permutation methods? • Promising universal methods for non-metric spaces • Mapping data from “hard ” spaces to “ easy ” spaces (the Euclidean space) • Database-friendly methods that are easy to implement on top of a database system or Lucene 5/ 17 4/9/15

Research questions 6/ 17 4/9/15

Research questions • How good are permutation-based projections? 6/ 17 4/9/15

Research questions • How good are permutation-based projections? • How well do permutation methods fare against state of the art? 6/ 17 4/9/15

Permutation Methods • Filter-and-re fi ne methods using pivot-based projection to the permutation space ( L 1 or L 2 ) 7/ 17 4/9/15

Permutation Methods • Filter-and-re fi ne methods using pivot-based projection to the permutation space ( L 1 or L 2 ) • Select randomly a set of reference points called pivots 7/ 17 4/9/15

Permutation Methods • Filter-and-re fi ne methods using pivot-based projection to the permutation space ( L 1 or L 2 ) • Select randomly a set of reference points called pivots • Order pivots by their distances to data points to obtain pivot rankings, which we call permutations 7/ 17 4/9/15

Permutation Methods • Filter-and-re fi ne methods using pivot-based projection to the permutation space ( L 1 or L 2 ) • Select randomly a set of reference points called pivots • Order pivots by their distances to data points to obtain pivot rankings, which we call permutations • Filter by comparing permutations to obtain candidate points 7/ 17 4/9/15

Permutation Methods • Filter-and-re fi ne methods using pivot-based projection to the permutation space ( L 1 or L 2 ) • Select randomly a set of reference points called pivots • Order pivots by their distances to data points to obtain pivot rankings, which we call permutations • Filter by comparing permutations to obtain candidate points • Re fi ne by comparing candidate points to the query 7/ 17 4/9/15

Permutation Methods How do we carry out the fi ltering step? 8/ 17 4/9/15

Permutation Methods How do we carry out the fi ltering step? • Brute force searching 8/ 17 4/9/15

Permutation Methods How do we carry out the fi ltering step? • Brute force searching • Indexing of permutations 8/ 17 4/9/15

Permutation Methods How do we carry out the fi ltering step? • Brute force searching • Indexing of permutations • Neighborhood APProximation Index (NAPP) is the best approach 8/ 17 4/9/15

Experiments: Datasets Name Distance Number Brute-force Dimens. function of points (sec.) Metric Data 5 · 10 6 CoPhIR L 2 0.6 282 5 · 10 6 SIFT L 2 0.3 128 1 · 10 6 ImageNet SQFD 4.1 N/A Non-Metric Data 4 · 10 6 10 5 Wiki-sparse Cosine sim. 1.9 2 · 10 6 Wiki-8 KL-div/JS-div 0.045/0.28 8 2 · 10 6 Wiki-128 KL-div/JS-div 0.22/ 4 128 1 · 10 6 DNA Norm. Leven. 3.5 N/A 9/ 17 4/9/15

Experiments: Projection Quality Distance in the original space vs. distance in the projected space. The closer to a monotonic mapping, the better : 300 200 100 0 0 200 400 600 Good projection (original distance: L 2 ) 10/ 17 4/9/15

Experiments: Projection Quality Distance in the original space vs. distance in the projected space. The closer to a monotonic mapping, the better : 250 200 150 100 50 0 0.0 0.2 0.4 0.6 Bad projection (original distance: JS-div.) 11/ 17 4/9/15

Experiments: E ffi ciency vs Accuracy Improvement in e ffi ciency over brute-force search vs. accuracy. Higher and to the right is better : SIFT ( L 2 ) Improv. in efficiency (log. scale) 10 2 VP-tree MPLSH 10 1 kNN-graph (SW) NAPP 0 . 6 0 . 7 0 . 8 0 . 9 1 Recall 12/ 17 4/9/15

Experiments: E ffi ciency vs Accuracy Improvement in e ffi ciency over brute-force search vs. accuracy. Higher and to the right is better : Norm. Levenshtein VP-tree Improv. in efficiency (log. scale) 10 2 kNN-graph (NN-desc) brute-force filt. bin. NAPP 10 1 0 . 6 0 . 7 0 . 8 0 . 9 1 Recall 13/ 17 4/9/15

Conclusions • Permutation methods beat state-of-the-art methods (VP-trees, kNN-graphs and Multiprobe LSH) for some data sets , in particular, when the distance function is expensive 14/ 17 4/9/15

Conclusions • Permutation methods beat state-of-the-art methods (VP-trees, kNN-graphs and Multiprobe LSH) for some data sets , in particular, when the distance function is expensive • The quality of permutation-based projection can be both good and poor : it appears to be better when the space is metric and/or dimensionality is low 14/ 17 4/9/15

Poster Session Discussion Points What makes a good, amenable, non-metric space? 15/ 17 4/9/15

Thank you for your attention! 16/ 17 4/9/15

Some technical details

Permutation Methods The data points are a , b , c , d in 2-dim. Euclidean space ( L 2 ). The Voronoi diagram produced by 4 pivots π i . Point Pivot Order Permutations � 2 Similar ( π 1 , π 2 , π 3 , π 4 ) ( 1 , 2 , 3 , 4 ) � 1 a a b d ( π 1 , π 2 , π 4 , π 3 ) ( 1 , 2 , 4 , 3 ) b c � 4 ( π 3 , π 1 , π 2 , π 4 ) ( 2 , 3 , 1 , 4 ) c � 3 ( π 4 , π 2 , π 1 , π 3 ) ( 3 , 2 , 4 , 1 ) d Position of π 4 is 1

Permutation Methods Permutation is a fancy word for a pivot ranking! The data points are a , b , c , d in 2-dim. Euclidean space ( L 2 ). The Voronoi diagram produced by 4 pivots π i . Point Pivot Order Permutations � 2 Similar ( π 1 , π 2 , π 3 , π 4 ) ( 1 , 2 , 3 , 4 ) � 1 a a b d ( π 1 , π 2 , π 4 , π 3 ) ( 1 , 2 , 4 , 3 ) b c � 4 ( π 3 , π 1 , π 2 , π 4 ) ( 2 , 3 , 1 , 4 ) c � 3 ( π 4 , π 2 , π 1 , π 3 ) ( 3 , 2 , 4 , 1 ) d Position of π 4 is 1

Permutation Methods • Filtering step - compare permutations instead of original data points to obtain γ candidate points • Footrule distance ( x , y ) = � i | x i − y i | (same as L 1 ) • Spearman’s rho distance (same as L 2 ) Footrule( a , • ) candidate points Point � 2 | 1 − 1 | + | 2 − 2 | + | 3 − 4 | + | 4 − 3 | = 2 � 1 b a b d | 1 − 2 | + | 2 − 3 | + | 3 − 1 | + | 4 − 4 | = 4 c c � 4 | 1 − 3 | + | 2 − 2 | + | 3 − 4 | + | 4 − 1 | = 6 d � 3 • Re fi nement step - apply d ( q , • ) for the candidate points (in our example, γ = 2, q = a , d ( q , b ) and d ( q , c ) )

Permutation Methods Filtering step: • Naive approach - Brute force searching • using a priority queue • incremental sorting [ Gonzales 2008 ] ( × 2 faster than the priority queue approach) • binarized permutations (select a threshold b and use the Hamming distance) • Brute force in the permutation space is e ffi cient if the distance is expensive.

Permutation Methods To reduce the cost of the fi ltering stage , three types of indices were proposed: • use the existing methods for metric spaces [ Figueroa 2009 ] • the Permutation Pre fi x Index (PP-Index) [ Esuli 2009 ] • the Metric Inverted File (MI- fi le) [ Amato et al. 2008 ]

Permutation Methods Permutation Pre fi x I ndex (PP-index) [ Esuli 2009 ] 1 4 Point Pivot Order 3 ( π 1 , π 2 , π 3 , π 4 ) a 2 1 2 ( π 1 , π 2 , π 4 , π 3 ) b ( π 3 , π 1 , π 2 , π 4 ) 3 4 2 1 c ( π 4 , π 2 , π 1 , π 3 ) d a c b d

Permutation Search Methods are Efficient, Yet Faster Search is - PowerPoint PPT Presentation

Permutation Search Methods are Efficient, Yet Faster Search is Possible Bileg (Bilegsaikhan) Naidan 1 Leo (Leonid) Boytsov 2 Eric Nyberg 2 1 Norwegian University of Science and Technology (NTNU) 2 Carnegie Mellon University (CMU)

The diameter of permutation groups permutation groups H. A. Helfgott February 2017 The

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Growth in permutation groups and linear New work on algebraic groups permutation groups H. A.

Efficient visual search of local features Efficient visual search of local features Cordelia

Outline DM811 Fall 2009 Heuristics for Combinatorial Optimization 1. Complete Search Methods

Lecture 4: Permutation Methods Applied Statistics 2014 1 / 21 Randomization Model Population

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Direct Search Methods (nongradient methods) 1. Random search methods 2. Univariate method (one

The diameter of permutation groups Proof ideas H. A. Helfgott and . Seress July 2013 Cayley

The diameter of permutation groups kos Seress May 2012 Cayley graphs The diameter of

Enumeration schemes for permutation patterns dashed permutation patterns Lara Pudwell Dashed

Algorithms for Permutation groups Alice Niemeyer UWA, RWTH Aachen Alice Niemeyer (UWA, RWTH

Statistics on permutation tableaux Pawel Hitczenko Drexel University parts based on joint work

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Q u a s i - n o r m a l m o d e s : w h a t c a n r i n g i n g b

Primordial Gravitational Waves from Inflation Eiichiro Komatsu [Max Planck Institute for

Gravitational Waves from the Early Universe Eiichiro Komatsu [Max Planck Institute for

AdS Space And Thermal Correlators Pinaki Banerjee The Institute of Mathematical Sciences July 3,

design review part 1 emily jordan aesthetics of design march 8, 2020 inspiration how can i

Reconstruction and Repair of 3D Surfaces TalkID 23152 This session will give the audience a quick

Low-rank Matrix Recovery using Pauli Measurements Yi-Kai Liu Applied and Computational

C ONSTANT S PACE C OMPLEXITY E NVIRONMENT R EPRESENTATION FOR V ISION - BASED N AVIGATION CONSTANT