sublinear time nearest neighbor search over generalized
play

Sublinear Time Nearest Neighbor Search over Generalized Weighted - PowerPoint PPT Presentation

Sublinear Time Nearest Neighbor Search over Generalized Weighted Space Yifan Lei Mohan S. Kankanhalli Anthony K. H. Tung School of Computing, National University of Singapore Source code:


  1. Sublinear Time Nearest Neighbor Search over Generalized Weighted Space Yifan Lei ๐‘๐ฃ๐›๐จ๐ก ๐ˆ๐ฏ๐›๐จ๐ก Mohan S. Kankanhalli Anthony K. H. Tung School of Computing, National University of Singapore Source code: https://github.com/1flei/aws_alsh 2019/6/11 1

  2. Applications ๏ฎ Nearest Neighbor Search (NNS) is widely used ๏ฎ Example: booking hotel for ICML 2019 ๏ฑ Considering the conditions to the convention centre, i.e., price, distance, and rating ๏ฑ Query ๐‘Ÿ : a hotel that the user booked before and felt excellent ๏ฑ Weight vector ๐‘ฅ : different users have different preference to the hotel conditions, which lead to different choices of hotels Price Distance Rating Price Distance Rating Hotel ๐‘Ÿ 300 7 10 Hotel 1 400 8 10 Hotel 2 350 6 8 ๐‘ฅ = 0.001, 1, 1 โ†’ Hotel 2 Hotel 3 250 9 8 โ†’ Hotel 1 ๐‘ฅ = 0, 1, 3 ๐‘ฅ = 0.001, โˆ’1, 1 โ†’ Hotel 3 Hotel 4 200 6 6 ๐‘ฅ = โˆ’0.001, โˆ’1, โˆ’1 โ†’ Hotel 4 2019/6/11 2

  3. Problem Defi finition ๏ฎ Given ๏ฑ A dataset ๐’  of ๐‘œ data objects in โ„ ๐‘’ ๏ฑ A query ๐‘Ÿ โˆˆ โ„ ๐‘’ with a weight vector ๐‘ฅ โˆˆ โ„ ๐‘’ ๏ฑ Measure: the Generalized Weighted Square Euclidean Distance (GWSED) ๐‘’ ๐‘ฅ ๐‘’ ๐‘ฅ ๐‘— ๐‘ ๐‘— โˆ’ ๐‘Ÿ ๐‘— 2 ๐‘’ ๐‘ฅ ๐‘, ๐‘Ÿ = เท ๐‘—=1 ๏ฎ Nearest Neighbor Search (NNS) over ๐‘’ ๐‘ฅ ๏ฑ To find ๐‘ โˆ— โˆˆ ๐’  s.t. ๐‘ โˆ— = arg min ๐‘โˆˆ๐’  ๐‘’ ๐‘ฅ (๐‘, ๐‘Ÿ) ๏ฎ This problem is very fundamental ๏ฑ Furthest Neighbor Search (FNS) and MIPS can be reduced to NNS over ๐‘’ ๐‘ฅ , ๏ฑ i.e., ๐‘ฅ ๐‘— = โˆ’1, โˆ€๐‘— โŸน arg min ๐‘โˆˆ๐’  ๐‘’ ๐‘ฅ ๐‘, ๐‘Ÿ = arg max ๐‘ โˆ’ ๐‘Ÿ ๐‘โˆˆ๐’  2019/6/11 3

  4. Background and Motivations ๏ฎ Locality-Sensitive Hashing (LSH) ๏ฑ Sublinear time for Near Neighbor Search ๏ฑ Insight: construct a hash function โ„Ž s.t. ๐‘„๐‘ [โ„Ž ๐‘ = โ„Ž(๐‘Ÿ)] is monotonic in ๐ธ๐‘—๐‘ก๐‘ข(๐‘, ๐‘Ÿ) ๏ฑ Hidden condition: ๐ธ๐‘—๐‘ก๐‘ข(๐‘, ๐‘Ÿ) must be a metric ๏ฎ LSH schemes cannot solve NNS over ๐‘’ ๐‘ฅ directly ( ๐‘’ ๐‘ฅ is no longer a metric if ๐‘ฅ ๐‘— < 0 ) ๏ฎ There is NO sublinear method for this problem ๏ฎ Motivations ๏ฑ Similar to ๐‘’ ๐‘ฅ , inner product (i.e., ๐‘ ๐‘ˆ ๐‘Ÿ ) is also not a metric ๏ฑ However, Shrivastava & Li (2014) introduced a sublinear time method based Asymmetric LSH which constructs ๐‘„(๐‘) and ๐‘…(๐‘Ÿ) for data objects ๐‘ โˆˆ ๐’  and each query ๐‘Ÿ , respectively. 2019/6/11 4

  5. Spherical Asymmetric Transformation ๏ฎ Negative result: ๏ฑ There is no Asymmetric LSH family over โ„ ๐‘’ for NNS over ๐‘’ ๐‘ฅ ( Lemma 1 and Theorem 2 ) ๏ฎ Spherical Asymmetric Transformation (SphAT): โ„ ๐‘’ โ†’ โ„ 2๐‘’ ๐‘„ ๐‘ = ๐ท๐‘ƒ๐‘‡ ๐‘ ; ๐‘‡๐ฝ๐‘‚ ๐‘ ๐‘… ๐‘Ÿ, ๐‘ฅ = ๐‘ฅโจ‚๐ท๐‘ƒ๐‘‡ ๐‘Ÿ ; ๐‘ฅโจ‚๐‘‡๐ฝ๐‘‚ ๐‘Ÿ ๏ฑ where ๐‘ฅโจ‚๐ท๐‘ƒ๐‘‡ ๐‘Ÿ = (๐‘ฅ 1 cos ๐‘Ÿ 1 , ๐‘ฅ 2 cos ๐‘Ÿ 2 , โ€ฆ , ๐‘ฅ ๐‘’ cos ๐‘Ÿ ๐‘’ ) ๏ฎ Properties of SphAT: ๏ฑ ๐‘’ ๐‘ฅ ๐‘, ๐‘Ÿ ~ Euclidean distance (or Angular distance) between ๐‘„ ๐‘ and ๐‘…(๐‘Ÿ, ๐‘ฅ) ๏ฑ SphAT is weight-oblivious (because ๐‘„(โ‹…) is independent of ๐‘ฅ ) โŸน build index before ๐‘Ÿ and ๐‘ฅ 2019/6/11 5

  6. Two Proposed Methods ๏ฎ SL-ALSH = SphAT + E2LSH ๏ฑ SphAT: arg min ๐‘โˆˆ๐’  ๐‘’ ๐‘ฅ ๐‘, ๐‘Ÿ โ‡’ arg min ๐‘„ ๐‘ โˆ’ ๐‘… ๐‘Ÿ, ๐‘ฅ ๐‘โˆˆ๐’  ๏ฑ Apply E2LSH on ๐‘„ ๐‘ and ๐‘… ๐‘Ÿ, ๐‘ฅ for NNS over Euclidean distance ๏ฎ S2-ALSH = SphAT + SimHash ๐‘„ ๐‘ ๐‘ˆ ๐‘…(๐‘Ÿ,๐‘ฅ) ๏ฑ SphAT: arg min ๐‘โˆˆ๐’  ๐‘’ ๐‘ฅ ๐‘, ๐‘Ÿ โ‡’ arg max ๐‘„(๐‘) ๐‘…(๐‘Ÿ,๐‘ฅ) ๐‘โˆˆ๐’  ๏ฑ Apply SimHash on ๐‘„ ๐‘ and ๐‘… ๐‘Ÿ, ๐‘ฅ for NNS over Angular distance ๏ฎ Main Results ๏ฑ ๐‘„๐‘ [โ„Ž ๐‘„ ๐‘ = โ„Ž(๐‘…(๐‘Ÿ, ๐‘ฅ))] is monotonic in ๐‘’ ๐‘ฅ (๐‘, ๐‘Ÿ) ( Lemmas 3 and 4 ) ๏ฑ SL-ALSH and S2-ALSH solve the problem of NNS over ๐‘’ ๐‘ฅ with sublinear time ( Theorems 3 and 4 ) 2019/6/11 6

  7. Datasets and Settings ๏ฎ Datasets ๏ฑ Mnist ( ๐‘œ = 60,000 and ๐‘’ = 784 ) ๏ฑ Sift ( ๐‘œ = 1,000,000 and ๐‘’ = 128 ) ๏ฑ Movielens ( ๐‘œ = 52,889 and ๐‘’ = 150 ) ๏ฎ Five types of weight vector ๐‘ฅ Types Illustrations Identical All โ€œ1โ€ Uniformly distributed in 0,1 ๐‘’ Binary Normal ๐‘’ -dimensional normal distribution ๐’ช(0, ๐ฝ) Uniformly distributed in 0,1 ๐‘’ Uniform Negative All โ€œ - 1โ€ 2019/6/11 7

  8. Bucketing Experiments Figure: The best fraction of dataset to scan to achieve certain level of recalls ( lower is better ). 2019/6/11 8

  9. Conclusions ๏ฎ Demonstrate that there is no Asymmetric LSH family over โ„ ๐‘’ for the problem of NNS over ๐‘’ ๐‘ฅ ๏ฎ Introduce a novel SphAT from โ„ ๐‘’ to โ„ 2๐‘’ ๏ฑ SphAT is weight-oblivious ๏ฑ ๐‘„๐‘ [โ„Ž ๐‘„ ๐‘ = โ„Ž(๐‘…(๐‘Ÿ, ๐‘ฅ))] is monotonic in ๐‘’ ๐‘ฅ (๐‘, ๐‘Ÿ) ๏ฎ Propose the first two sublinear time methods SL-ALSH and S2-ALSH for NNS over ๐‘’ ๐‘ฅ ๏ฎ Extensive experiments verify that SL-ALSH and S2-ALSH answer the NNS queries in sublinear time and support various types of weight vectors. 2019/6/11 9

  10. Poster Session [Poster #82: Tue Jun 11th 06:30 โ€” 09:00 PM @Pacific Ballroom] Thank you for your attention! 2019/6/11 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend