fast prefix search in little space with applications
play

Fast Prefix Search in Little Space, with Applications Djamal - PowerPoint PPT Presentation

Fast Prefix Search in Little Space, with Applications Djamal Belazzougui Paolo Boldi Rasmus Pagh Sebastiano Vigna ESA 2010 1 Talk overview 2 2 Talk overview 1. What? 2. Why? 3. What else? 4. How? 5. Then what? 2 2 1. What . 3 3


  1. Fast Prefix Search in Little Space, with Applications Djamal Belazzougui Paolo Boldi Rasmus Pagh Sebastiano Vigna ESA 2010 1

  2. Talk overview 2 2

  3. Talk overview 1. What? 2. Why? 3. What else? 4. How? 5. Then what? 2 2

  4. 1. What . 3 3

  5. 1. What ✤ Standard (RAM) model, word size w. ✤ Static set S of n strings . ✤ Prefix query : Given a string p, what strings in S have p as a prefix? ‣ Report all matching strings. 3 3

  6. 1. What ✤ Standard (RAM) model, word size w. ✤ Static set S of n strings . ✤ Prefix query : Given a string p, what strings in S have p as a prefix? ranks of ‣ Report all matching strings. ‣ Index: Assume strings stored sorted. 3 3

  7. 1. What ✤ Standard (RAM) model, word size w. ✤ Static set S of n strings , w bits each. ✤ Prefix query : Given a string p, what strings in S have p as a prefix? ranks of ‣ Report all matching strings. ‣ Index: Assume strings stored sorted. 3 3

  8. 2. Why? 4 4

  9. 2. Why? ALGO Liverp* 4 4

  10. 2. Why? ✤ OLAP in a nutshell: ‣ Dimensions D = Set<rooted tree>. ‣ FactTable F = List<node from each D, number>. ‣ Query : Given subtrees of D, sum up the numbers in F where all nodes are contained in the subtrees. 5 5

  11. 2. Why? slow memory fast memory index data (sorted) 6 6

  12. 3. What else? ✤ Special case of range query ‣ return rank S ([a;b]) ✤ Generalizes point query ‣ return rank S ({x}) ✤ No easier than existence queries ‣ return S ∩ [a;b] ≠ ∅ 7 7

  13. Results on query time (space O(nw) bits) range 8 8

  14. Results on query time (space O(nw) bits) existence rank point range 8 8

  15. Results on query time (space O(nw) bits) existence rank O(1) point [FKS, FOCS ’82] range 8 8

  16. Results on query time (space O(nw) bits) existence rank O(1) point O(log w) [FKS, FOCS ’82] [vEB, FOCS ’75] Time-Space Trade-Offs for Predecessor Search Ω (log w) ∗ (Extended Abstract) range Mihai Pˇ atras ¸cu Mikkel Thorup [PT, STOC ‘06] mip@mit.edu mthorup@research.att.com ABSTRACT Categories and Subject Descriptors We develop a new technique for proving cell-probe lower F.2.3 [ Tradeo ff s between Complexity Measures ]; E.2 bounds for static data structures. Previous lower bounds [ Data Storage Representations ] used a reduction to communication games, which was known not to be tight by counting arguments. We give the first General Terms lower bound for an explicit problem which breaks this com- Algorithms, Performance, Theory munication complexity barrier. In addition, our bounds give the first separation between polynomial and near linear 8 space. Such a separation is inherently impossible by com- Keywords munication complexity. predecessor search, cell-probe complexity, lower bounds Using our lower bound technique and new upper bound constructions, we obtain tight bounds for searching pre- 8

  17. Results on query time (space O(nw) bits) existence rank O(1) point O(log w) [FKS, FOCS ’82] [vEB, FOCS ’75] Optimal Static Range Reporting in One Dimension Ω (log w) O(1) ∗ † ∗ Stephen Alstrup Gerth Stølting Brodal Theis Rauhe range ‡ [PT, STOC ‘06] BRICS The IT University of The IT University of Dept. of Computer Science Copenhagen Copenhagen [ABR, STOC ’01] University of Aarhus stephen@it-c.dk theis@it-c.dk gerth@brics.dk ABSTRACT FindAny ( a, b ) , a, b ∈ U : Report any element in S ∩ [ a, b ] or ⊥ if there is no such element. We consider static one dimensional range searching prob- lems. These problems are to build static data structures for Report ( a, b ) , a, b ∈ U : Report all elements in S ∩ [ a, b ]. an integer set S ⊆ U , where U = { 0 , 1 , . . . , 2 w − 1 } , which Count ε ( a, b ) , a, b ∈ U, ε ≥ 0: Return an integer k such that support various queries for integer intervals of U . For the | S ∩ [ a, b ] | ≤ k ≤ (1 + ε ) | S ∩ [ a, b ] | . query of reporting all integers in S contained within a query interval, we present an optimal data structure with linear We let n denote the size of S and let u = 2 w denote the size space cost and with query time linear in the number of inte- of universe U . Our main result is a static data structure gers reported. This result holds in the unit cost RAM model with space cost O( n ) that supports the query FindAny in with word size w and a standard instruction set. We also 8 constant time. As a corollary, the data structure allows present a linear space data structure for approximate range Report in time O( k ), where k is the number of elements to counting. A range counting query for an interval returns be reported. the number of integers in S contained within the interval. Furthermore, we give linear space structures for the ap- For any constant ε > 0, our range counting data structure proximate range counting problem. We present a data struc- returns in constant time an approximate answer which is ture that uses space O( n ) and supports Count ε in constant within a factor of at most 1 + ε of the correct answer. 8

  18. Results on query time (space O(nw) bits) existence rank O(1) point O(log w) [FKS, FOCS ’82] [vEB, FOCS ’75] Ω (log w) O(1) range [PT, STOC ‘06] [ABR, STOC ’01] 8 8

  19. Weak queries ✤ Guarantee output only on some inputs ‣ Rank of prefixes of strings in S, in O(1) time [ABR ’01]. ‣ Represent a function with domain S, Optimal Static Range Reporting in One Dimension without storing S [SS ‘89], [CKRT, ’04]. † ∗ ∗ Stephen Alstrup Gerth Stølting Brodal Theis Rauhe ‡ BRICS The IT University of The IT University of ‣ Rank of any string in S, using O(n log log w) Dept. of Computer Science Copenhagen Copenhagen University of Aarhus stephen@it-c.dk theis@it-c.dk The Bloomier Filter: An E ffi cient Data Structure for Static Support gerth@brics.dk bits of space [BBPV ‘09]. Lookup Tables ∗ ABSTRACT FindAny ( a, b ) , a, b ∈ U : Report any element in S ∩ [ a, b ] or Bernard Chazelle † Joe Kilian ‡ Ronitt Rubinfeld ‡ Ayellet Tal § ⊥ if there is no such element. We consider static one dimensional range searching prob- lems. These problems are to build static data structures for Report ( a, b ) , a, b ∈ U : Report all elements in S ∩ [ a, b ]. Monotone Minimal Perfect Hashing: an integer set S ⊆ U , where U = { 0 , 1 , . . . , 2 w − 1 } , which Count ε ( a, b ) , a, b ∈ U, ε ≥ 0: Return an integer k such that support various queries for integer intervals of U . For the Searching a Sorted Table with O (1) Accesses “Oh boy, here is another David Nelson” the problem was due to name-matching technology used | S ∩ [ a, b ] | ≤ k ≤ (1 + ε ) | S ∩ [ a, b ] | . query of reporting all integers in S contained within a query Ticket Agent, Los Angeles Airport by airlines.” interval, we present an optimal data structure with linear We let n denote the size of S and let u = 2 w denote the size (Source: BBC News) This story illustrates a common problem that arises Paolo Boldi † Rasmus Pagh ‡ Sebastiano Vigna † space cost and with query time linear in the number of inte- Djamal Belazzougui ∗ of universe U . Our main result is a static data structure 9 when one tries to balance false negatives and false gers reported. This result holds in the unit cost RAM model with space cost O( n ) that supports the query FindAny in Abstract positives: if one is unwilling to accept any false negatives with word size w and a standard instruction set. We also constant time. As a corollary, the data structure allows whatsoever, one often pays with a high false positive present a linear space data structure for approximate range We introduce the Bloomier filter , a data structure for Report in time O( k ), where k is the number of elements to Abstract studied in the last years, leading to fundamental the- rate. Ideally, one would like to adjust one’s system 9

  20. Weak queries ✤ Guarantee output only on some inputs ‣ Rank of prefixes of strings in S, in O(1) time [ABR ’01]. ‣ Represent a function with domain S, without storing S [SS ‘89], [CKRT, ’04]. ‣ Rank of any string in S, using O(n log log w) The Bloomier Filter: An E ffi cient Data Structure for Static Support bits of space [BBPV ‘09]. Lookup Tables ∗ Bernard Chazelle † Joe Kilian ‡ Ronitt Rubinfeld ‡ Ayellet Tal § Monotone Minimal Perfect Hashing: Searching a Sorted Table with O (1) Accesses “Oh boy, here is another David Nelson” the problem was due to name-matching technology used Ticket Agent, Los Angeles Airport by airlines.” (Source: BBC News) This story illustrates a common problem that arises Paolo Boldi † Rasmus Pagh ‡ Sebastiano Vigna † Djamal Belazzougui ∗ 9 when one tries to balance false negatives and false Abstract positives: if one is unwilling to accept any false negatives whatsoever, one often pays with a high false positive We introduce the Bloomier filter , a data structure for Abstract studied in the last years, leading to fundamental the- rate. Ideally, one would like to adjust one’s system 9

  21. Weak queries ✤ Guarantee output only on some inputs ‣ Rank of prefixes of strings in S, in O(1) time [ABR ’01]. ‣ Represent a function with domain S, without storing S [SS ‘89], [CKRT, ’04]. ‣ Rank of any string in S, using O(n log log w) bits of space [BBPV ‘09]. Monotone Minimal Perfect Hashing: Searching a Sorted Table with O (1) Accesses Paolo Boldi † Rasmus Pagh ‡ Sebastiano Vigna † Djamal Belazzougui ∗ 9 Abstract studied in the last years, leading to fundamental the- 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend