reconfigurable inverted index
play

Reconfigurable Inverted Index Yusuke Matsui 1 Ryota Hinami 2 Shinichi - PowerPoint PPT Presentation

slides: https://bit.ly/2P0KuW1 Reconfigurable Inverted Index Yusuke Matsui 1 Ryota Hinami 2 Shinichi Satoh 1 1 National Institute of Informatics 2 The University of Tokyo slides: https://bit.ly/2P0KuW1 Approximate nearest neighbor search


  1. slides: https://bit.ly/2P0KuW1 Reconfigurable Inverted Index Yusuke Matsui 1 Ryota Hinami 2 Shin’ichi Satoh 1 1 National Institute of Informatics 2 The University of Tokyo

  2. slides: https://bit.ly/2P0KuW1 Approximate nearest neighbor search Approximate NN search Result 0.23 0.20 2 3.15 3.25 argmin 𝒓 βˆ’ π’š π‘œ Query 2 ANN system 0.65 0.72 π‘œβˆˆ 1,…,𝑂 1.43 1.68 𝒓 ∈ ℝ 𝐸 π’š 74 hash-table, trees, inverted-index, etc Add Database vectors 4.63 0.86 5.22 … 0.54 6.21 3.44 1.66 0.72 1.12 0.74 0.31 0.04 π’š 1 π’š 2 π’š 𝑂

  3. slides: https://bit.ly/2P0KuW1 Approximate nearest neighbor search Approximate NN search Result 0.23 0.20 2 3.15 3.25 argmin 𝒓 βˆ’ π’š π‘œ Query 2 ANN system 0.65 0.72 π‘œβˆˆ 1,…,𝑂 1.43 1.68 𝒓 ∈ ℝ 𝐸 π’š 74 hash-table, trees, inverted-index, etc Add Database vectors 4.63 0.86 5.22 … 0.54 6.21 3.44 1.66 0.72 1.12 0.74 0.31 0.04 π’š 1 π’š 2 π’š 𝑂

  4. slides: https://bit.ly/2P0KuW1 Approximate nearest neighbor search Approximate NN search Result 0.23 0.20 2 3.15 3.25 argmin 𝒓 βˆ’ π’š π‘œ Query 2 ANN system 0.65 0.72 π‘œβˆˆ 1,…,𝑂 1.43 1.68 𝒓 ∈ ℝ 𝐸 π’š 74 hash-table, trees, inverted-index, etc Add Database vectors 4.63 0.86 5.22 … 0.54 6.21 3.44 1.66 0.72 1.12 0.74 0.31 0.04 π’š 1 π’š 2 π’š 𝑂

  5. slides: https://bit.ly/2P0KuW1 Related work ➒ Locality-sensitive-hashing (LSH) - FALCONN [Andoni+, 15] [Razenshteyn+, 18] ➒ Project/tree-based - FLANN [Muja+, 14] - Annoy [Bernhardsson, 18] ➒ Graph traversal - NSW/HNSW on NMSLIB [Malkov+, 16][Boytsov+, 13] ➒ Product quantization (PQ) - IVFPQ on Faiss [Jégou+, 11][Johnson+, 17] etc. - Our Reconfigurable Inverted Index

  6. slides: https://bit.ly/2P0KuW1 Approximate NN Search Result 0.23 0.20 2 3.15 3.25 argmin 𝒓 βˆ’ π’š π‘œ Query 2 ANN system 0.65 0.72 π‘œβˆˆπ’― 1.43 1.68 𝒓 ∈ ℝ 𝐸 π’š 74 Subset search problem ➒ Existing ANN systems are fast for the all vectors Search is over 𝒯 = 1, … , 𝑂 - ➒ However, it is hard to run the search for a subset Search is over 𝒯 βŠ† 1, … , 𝑂 - e.g., searching from π’š 1000 , … , π’š 2000 - - Why? Systems are usually optimized for 𝒯 = 1, … , 𝑂

  7. slides: https://bit.ly/2P0KuW1 There is a demand for subset search!

  8. slides: https://bit.ly/2P0KuW1 There is a demand for subset search! Propose: Reconfigurable inverted index (Rii) βœ“ Subset search βœ“ A comparative performance with IVFPQ (Faiss) βœ“ 10 ms for billion-scale data

  9. slides: https://bit.ly/2P0KuW1 Reconfigurable inverted index (Rii) ➒ Preliminary Fast if |𝒯| Runtime is small - PQ linear scan 𝒯 - IVFPQ Fast if |𝒯| Runtime is large ➒ Data structure 𝒯 ➒ Search Cherry pick! Runtime Always fast 𝒯

  10. slides: https://bit.ly/2P0KuW1 Reconfigurable inverted index (Rii) ➒ Preliminary Fast if |𝒯| Runtime is small - PQ linear scan 𝒯 - IVFPQ Fast if |𝒯| Runtime is large ➒ Data structure 𝒯 ➒ Search Cherry pick! Runtime Always fast 𝒯

  11. slides: https://bit.ly/2P0KuW1 Preliminary: Product quantization (PQ) [JΓ©gou+, TPAMI 11] PQ : Compress a vector All database vectors are PQ-encoded beforehand π’š 1 π’š 2 π’š 𝑂 into a short code 5.22 4.63 0.86 5.22 … 0.54 6.21 3.44 0.54 1.66 0.72 1.12 1.66 0.31 0.04 0.74 0.74 PQ PQ PQ ℝ 4 β†’ … 2 , , 1 2 N …

  12. slides: https://bit.ly/2P0KuW1 Preliminary: Product quantization (PQ) [JΓ©gou+, TPAMI 11] ➒ The subset search is possible with a linear cost of 𝒯 π‘œ e.g., 𝒯 = 2, 4, 5, 8 argmin 𝑒 𝒓, π‘œβˆˆπ’― βœ” βœ” βœ” 1 2 3 4 5 6 N 0.23 … 3.15 Linearly compared 0.65 1.43 𝒓 ∈ ℝ 𝐸 Runtime ➒ The search is efficient only if 𝒯 is small 𝑂 𝒯

  13. slides: https://bit.ly/2P0KuW1 Reconfigurable inverted index (Rii) ➒ Preliminary Fast if |𝒯| Runtime is small - PQ linear scan 𝒯 - IVFPQ Fast if |𝒯| Runtime is large ➒ Data structure 𝒯 ➒ Search Cherry pick! Runtime Always fast ➒ Evaluation 𝒯

  14. slides: https://bit.ly/2P0KuW1 Preliminary: Inverted Index + PQ (IVFPQ) [JΓ©gou+, TPAMI 11] ➒ Current basic data structure for a large-scale search ➒ Subset-search is possible only if 𝒯 is large 126 225 𝒅 1 𝒅 2 𝒅 13 13 92 188 𝒅 5 Space partitioning …

  15. slides: https://bit.ly/2P0KuW1 Preliminary: Inverted Index + PQ (IVFPQ) [JΓ©gou+, TPAMI 11] ➒ Current basic data structure for a large-scale search ➒ Subset-search is possible only if 𝒯 is large e.g., 𝒯 = 13, 92, 105, … 126 225 π‘œ ∈ 𝒯 or not 0.23 3.15 𝒅 1 0.65 1.43 βœ” Re-rank via βœ” 𝒓 ∈ ℝ 𝐸 𝒅 2 𝒅 13 13 92 188 92 PQ-linear scan 𝒅 5 Space partitioning … 1.Find the closest space: 𝑙 βˆ— = argmin 𝑙 𝒓 βˆ’ 𝒅 𝑙 2 2 2.Focus the 𝑙 βˆ— th space, accept items ∈ 𝒯 3.Re-rank the items via PQ-linear scan

  16. slides: https://bit.ly/2P0KuW1 Preliminary: Inverted Index + PQ (IVFPQ) [JΓ©gou+, TPAMI 11] ➒ Current basic data structure for a large-scale search ➒ Subset-search is possible only if 𝒯 is large e.g., 𝒯 = 13, 92, 105, … 126 225 π‘œ ∈ 𝒯 or not 0.23 3.15 𝒅 1 0.65 1.43 βœ” Re-rank via βœ” 𝒓 ∈ ℝ 𝐸 𝒅 2 𝒅 13 13 92 188 92 PQ-linear scan 𝒅 5 Space partitioning Runtime Why is it slow for small 𝒯 ? … 1.Find the closest space: 𝑙 βˆ— = argmin 𝑙 𝒓 βˆ’ 𝒅 𝑙 2 e.g., if 𝒯 is small and they are far away from 2 the query, we might need to scan all items 2.Focus the 𝑙 βˆ— th space, accept items ∈ 𝒯 𝑂 𝒯 3.Re-rank the items via PQ-linear scan

  17. slides: https://bit.ly/2P0KuW1 Reconfigurable inverted index (Rii) ➒ Preliminary Fast if |𝒯| Runtime is small - PQ linear scan 𝒯 - IVFPQ Fast if |𝒯| Runtime is large ➒ Data structure 𝒯 ➒ Search Cherry pick! Runtime Always fast 𝒯

  18. slides: https://bit.ly/2P0KuW1 Data structure ➒ Store (1) PQ-codes linearly , and (2) IDs as an inverted index ➒ Can run either PQ-linear-scan or IVFPQ with a single data structure 1 2 13 N Key: store codes linearly … … cf. IVFPQ ➒ PQ-codes are also chunked. Natural 126 225 𝒅 1 ➒ Slight, but critical change 126 225 𝒅 2 13 92 188 𝒅 1 𝒅 5 … 13 92 188 𝒅 2 𝒅 5

  19. slides: https://bit.ly/2P0KuW1 Reconfigurable inverted index (Rii) ➒ Preliminary Fast if |𝒯| Runtime is small - PQ linear scan 𝒯 - IVFPQ Fast if |𝒯| Runtime is large ➒ Data structure 𝒯 ➒ Search Cherry pick! Runtime Always fast 𝒯

  20. slides: https://bit.ly/2P0KuW1 Search ➒ If 𝒯 is small, run PQ-linear scan ➒ If 𝒯 is large, run IVFPQ Runtime 1 2 13 N … … … … 𝒯 𝑂 0.23 3.15 0.65 1.43 126 225 𝒅 1 𝒓 ∈ ℝ 𝐸 Runtime 𝒅 2 13 92 188 𝒅 5 … 𝑂 𝒯

  21. slides: https://bit.ly/2P0KuW1 Search ➒ If 𝒯 is small, run PQ-linear scan ➒ If 𝒯 is large, run IVFPQ Runtime 1 2 13 N … … … … 𝒯 𝑂 0.23 3.15 0.65 1.43 126 225 𝒅 1 𝒓 ∈ ℝ 𝐸 Runtime 𝒅 2 13 92 188 𝒅 5 … 𝑂 𝒯

  22. slides: https://bit.ly/2P0KuW1 Search ➒ If 𝒯 is small, run PQ-linear scan ➒ If 𝒯 is large, run IVFPQ Runtime 1 2 13 N … … … … 𝒯 𝑂 0.23 3.15 0.65 1.43 126 225 𝒅 1 𝒓 ∈ ℝ 𝐸 fetch Runtime 𝒅 2 13 92 188 𝒅 5 … 𝑂 𝒯

  23. slides: https://bit.ly/2P0KuW1 Search ➒ If 𝒯 is small, run PQ-linear scan ➒ If 𝒯 is large, run IVFPQ Runtime 1 2 13 N … … … … 𝒯 𝑂 0.23 3.15 0.65 1.43 126 225 𝒅 1 𝒓 ∈ ℝ 𝐸 fetch Runtime 𝒅 2 13 92 188 𝒅 5 … 𝑂 𝒯

  24. slides: https://bit.ly/2P0KuW1 Search ➒ Set a threshold πœ„ ➒ If 𝒯 is small, run PQ-linear scan ➒ Key: Switch two methods ➒ If 𝒯 is large, run IVFPQ based on 𝒯 β‰Ά πœ„ 1 2 13 N … … … … Runtime 0.23 3.15 0.65 1.43 126 225 𝒅 1 𝑂 πœ„ 𝒓 ∈ ℝ 𝐸 fetch 𝒯 𝒅 2 13 92 188 Use PQ-linear-scan Use IVFPQ 𝒅 5 …

  25. slides: https://bit.ly/2P0KuW1 Search ➒ Set a threshold πœ„ ➒ If 𝒯 is small, run PQ-linear scan ➒ Key: Switch two methods ➒ If 𝒯 is large, run IVFPQ based on 𝒯 β‰Ά πœ„ 1 2 13 N … … … … Runtime 0.23 3.15 0.65 1.43 126 225 𝒅 1 𝑂 πœ„ 𝒓 ∈ ℝ 𝐸 fetch 𝒯 𝒅 2 13 92 188 Use PQ-linear-scan Use IVFPQ 𝒅 5 …

  26. slides: https://bit.ly/2P0KuW1 Evaluation ➒ SIFT1M ( 𝑂 = 10 6 , 𝐸 = 128 ). Results for top-R search

  27. slides: https://bit.ly/2P0KuW1 ➒ Existing system: Annoy Evaluation ➒ Force to search a subset ➒ SIFT1M ( 𝑂 = 10 6 , 𝐸 = 128 ). Results for top-R search The existing system is slow, especially when 𝒯 is small Proposed Rii is always fast regardless of 𝒯 and 𝑆

  28. slides: https://bit.ly/2P0KuW1 $ pip install rii https://github.com/matsui528/rii import rii import nanopq # Prepare a PQ/OPQ codec with M=32 sub spaces codec = nanopq.PQ(M=32).fit(vecs=Xt) # Trained using Xt # Instantiate a Rii class with the codec e = rii.Rii(fine_quantizer=codec) # Add vectors e.add_configure(vecs=X) # Search ids, dists = e.query(q=q, topk=3, target_ids=S) print(ids, dists) # e.g., [7484 8173 1556] [15.062 15.385 16.169]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend