Sublinear Time Nearest Neighbor Search over Generalized Weighted - - PowerPoint PPT Presentation

โ–ถ
sublinear time nearest neighbor search over generalized
SMART_READER_LITE
LIVE PREVIEW

Sublinear Time Nearest Neighbor Search over Generalized Weighted - - PowerPoint PPT Presentation

Sublinear Time Nearest Neighbor Search over Generalized Weighted Space Yifan Lei Mohan S. Kankanhalli Anthony K. H. Tung School of Computing, National University of Singapore Source code:


slide-1
SLIDE 1

Sublinear Time Nearest Neighbor Search over Generalized Weighted Space

Source code: https://github.com/1flei/aws_alsh

Yifan Lei ๐‘๐ฃ๐›๐จ๐ก ๐ˆ๐ฏ๐›๐จ๐ก Mohan S. Kankanhalli Anthony K. H. Tung

School of Computing, National University of Singapore

2019/6/11 1

slide-2
SLIDE 2

Applications

๏ฎ Nearest Neighbor Search (NNS) is widely used ๏ฎ Example: booking hotel for ICML 2019

๏ฑ Considering the conditions to the convention centre, i.e., price, distance, and rating ๏ฑ Query ๐‘Ÿ: a hotel that the user booked before and felt excellent ๏ฑ Weight vector ๐‘ฅ: different users have different preference to the hotel conditions, which lead to

different choices of hotels

2019/6/11 2

Price Distance Rating Hotel 1 400 8 10 Hotel 2 350 6 8 Hotel 3 250 9 8 Hotel 4 200 6 6 Price Distance Rating Hotel ๐‘Ÿ 300 7 10 ๐‘ฅ = 0.001, 1, 1 ๐‘ฅ = 0, 1, 3 ๐‘ฅ = 0.001, โˆ’1, 1 ๐‘ฅ = โˆ’0.001, โˆ’1, โˆ’1 โ†’ Hotel 2 โ†’ Hotel 1 โ†’ Hotel 3 โ†’ Hotel 4

slide-3
SLIDE 3

Problem Defi finition

๏ฎ Given

๏ฑ A dataset ๐’  of ๐‘œ data objects in โ„๐‘’ ๏ฑ A query ๐‘Ÿ โˆˆ โ„๐‘’ with a weight vector ๐‘ฅ โˆˆ โ„๐‘’ ๏ฑ Measure: the Generalized Weighted Square Euclidean Distance (GWSED) ๐‘’๐‘ฅ

๐‘’๐‘ฅ ๐‘, ๐‘Ÿ = เท

๐‘—=1 ๐‘’

๐‘ฅ๐‘— ๐‘๐‘— โˆ’ ๐‘Ÿ๐‘— 2

๏ฎ Nearest Neighbor Search (NNS) over ๐‘’๐‘ฅ

๏ฑ To find ๐‘โˆ— โˆˆ ๐’  s.t. ๐‘โˆ— = arg min

๐‘โˆˆ๐’  ๐‘’๐‘ฅ(๐‘, ๐‘Ÿ)

๏ฎ This problem is very fundamental

๏ฑ Furthest Neighbor Search (FNS) and MIPS can be reduced to NNS over ๐‘’๐‘ฅ, ๏ฑ i.e., ๐‘ฅ๐‘— = โˆ’1, โˆ€๐‘— โŸน arg min

๐‘โˆˆ๐’  ๐‘’๐‘ฅ ๐‘, ๐‘Ÿ = arg max ๐‘โˆˆ๐’ 

๐‘ โˆ’ ๐‘Ÿ

2019/6/11 3

slide-4
SLIDE 4

Background and Motivations

๏ฎ Locality-Sensitive Hashing (LSH)

๏ฑ Sublinear time for Near Neighbor Search ๏ฑ Insight: construct a hash function โ„Ž s.t. ๐‘„๐‘ [โ„Ž ๐‘ = โ„Ž(๐‘Ÿ)] is monotonic in ๐ธ๐‘—๐‘ก๐‘ข(๐‘, ๐‘Ÿ) ๏ฑ Hidden condition: ๐ธ๐‘—๐‘ก๐‘ข(๐‘, ๐‘Ÿ) must be a metric

๏ฎ LSH schemes cannot solve NNS over ๐‘’๐‘ฅ directly (๐‘’๐‘ฅ is no longer a metric if ๐‘ฅ๐‘— < 0) ๏ฎ There is NO sublinear method for this problem ๏ฎ Motivations

๏ฑ Similar to ๐‘’๐‘ฅ, inner product (i.e., ๐‘๐‘ˆ๐‘Ÿ) is also not a metric ๏ฑ However, Shrivastava & Li (2014) introduced a sublinear time method based Asymmetric LSH which

constructs ๐‘„(๐‘) and ๐‘…(๐‘Ÿ) for data objects ๐‘ โˆˆ ๐’  and each query ๐‘Ÿ, respectively.

2019/6/11 4

slide-5
SLIDE 5

Spherical Asymmetric Transformation

๏ฎ Negative result:

๏ฑ There is no Asymmetric LSH family over โ„๐‘’ for NNS over ๐‘’๐‘ฅ (Lemma 1 and Theorem 2)

๏ฎ Spherical Asymmetric Transformation (SphAT): โ„๐‘’ โ†’ โ„2๐‘’

๐‘„ ๐‘ = ๐ท๐‘ƒ๐‘‡ ๐‘ ; ๐‘‡๐ฝ๐‘‚ ๐‘ ๐‘… ๐‘Ÿ, ๐‘ฅ = ๐‘ฅโจ‚๐ท๐‘ƒ๐‘‡ ๐‘Ÿ ; ๐‘ฅโจ‚๐‘‡๐ฝ๐‘‚ ๐‘Ÿ

๏ฑ where ๐‘ฅโจ‚๐ท๐‘ƒ๐‘‡ ๐‘Ÿ = (๐‘ฅ1 cos ๐‘Ÿ1 , ๐‘ฅ2 cos ๐‘Ÿ2 , โ€ฆ , ๐‘ฅ๐‘’ cos ๐‘Ÿ๐‘’)

๏ฎ Properties of SphAT:

๏ฑ ๐‘’๐‘ฅ ๐‘, ๐‘Ÿ ~ Euclidean distance (or Angular distance) between ๐‘„ ๐‘ and ๐‘…(๐‘Ÿ, ๐‘ฅ) ๏ฑ SphAT is weight-oblivious (because ๐‘„(โ‹…) is independent of ๐‘ฅ) โŸน build index before ๐‘Ÿ and ๐‘ฅ 2019/6/11 5

slide-6
SLIDE 6

Two Proposed Methods

๏ฎ SL-ALSH = SphAT + E2LSH

๏ฑ SphAT: arg min

๐‘โˆˆ๐’  ๐‘’๐‘ฅ ๐‘, ๐‘Ÿ โ‡’ arg min ๐‘โˆˆ๐’ 

๐‘„ ๐‘ โˆ’ ๐‘… ๐‘Ÿ, ๐‘ฅ

๏ฑ Apply E2LSH on ๐‘„ ๐‘ and ๐‘… ๐‘Ÿ, ๐‘ฅ for NNS over Euclidean distance

๏ฎ S2-ALSH = SphAT + SimHash

๏ฑ SphAT: arg min

๐‘โˆˆ๐’  ๐‘’๐‘ฅ ๐‘, ๐‘Ÿ โ‡’ arg max ๐‘โˆˆ๐’  ๐‘„ ๐‘ ๐‘ˆ๐‘…(๐‘Ÿ,๐‘ฅ) ๐‘„(๐‘) ๐‘…(๐‘Ÿ,๐‘ฅ)

๏ฑ Apply SimHash on ๐‘„ ๐‘ and ๐‘… ๐‘Ÿ, ๐‘ฅ for NNS over Angular distance

๏ฎ Main Results

๏ฑ ๐‘„๐‘ [โ„Ž ๐‘„ ๐‘

= โ„Ž(๐‘…(๐‘Ÿ, ๐‘ฅ))] is monotonic in ๐‘’๐‘ฅ(๐‘, ๐‘Ÿ) (Lemmas 3 and 4)

๏ฑ SL-ALSH and S2-ALSH solve the problem of NNS over ๐‘’๐‘ฅ with sublinear time (Theorems 3 and 4) 2019/6/11 6

slide-7
SLIDE 7

Datasets and Settings

๏ฎ Datasets

๏ฑ Mnist (๐‘œ = 60,000 and ๐‘’ = 784) ๏ฑ Sift (๐‘œ = 1,000,000 and ๐‘’ = 128) ๏ฑ Movielens (๐‘œ = 52,889 and ๐‘’ = 150)

๏ฎ Five types of weight vector ๐‘ฅ

Types Illustrations Identical All โ€œ1โ€ Binary Uniformly distributed in 0,1 ๐‘’ Normal ๐‘’-dimensional normal distribution ๐’ช(0, ๐ฝ) Uniform Uniformly distributed in 0,1 ๐‘’ Negative All โ€œ-1โ€

2019/6/11 7

slide-8
SLIDE 8

Bucketing Experiments

Figure: The best fraction of dataset to scan to achieve certain level of recalls (lower is better).

2019/6/11 8

slide-9
SLIDE 9

Conclusions

๏ฎ Demonstrate that there is no Asymmetric LSH family over โ„๐‘’ for the problem of NNS

  • ver ๐‘’๐‘ฅ

๏ฎ Introduce a novel SphAT from โ„๐‘’ to โ„2๐‘’

๏ฑ SphAT is weight-oblivious ๏ฑ ๐‘„๐‘ [โ„Ž ๐‘„ ๐‘

= โ„Ž(๐‘…(๐‘Ÿ, ๐‘ฅ))] is monotonic in ๐‘’๐‘ฅ(๐‘, ๐‘Ÿ)

๏ฎ Propose the first two sublinear time methods SL-ALSH and S2-ALSH for NNS over ๐‘’๐‘ฅ ๏ฎ Extensive experiments verify that SL-ALSH and S2-ALSH answer the NNS queries in

sublinear time and support various types of weight vectors.

2019/6/11 9

slide-10
SLIDE 10

2019/6/11 10

Thank you for your attention! Poster Session

[Poster #82: Tue Jun 11th 06:30โ€”09:00 PM @Pacific Ballroom]