approximate range emptiness in constant time and optimal
play

Approximate Range Emptiness in Constant Time and Optimal Space - PowerPoint PPT Presentation

Approximate Range Emptiness in Constant Time and Optimal Space Mayank Goswami, Allan Grnlund, Kasper Larsen, Rasmus Pagh Max-Planck Institute for Informatics, (MADALGO-Aarhus) 2 , IT University of Copenhagen SODA 2015, San Diego Approximate


  1. Approximate Range Emptiness in Constant Time and Optimal Space Mayank Goswami, Allan Grønlund, Kasper Larsen, Rasmus Pagh Max-Planck Institute for Informatics, (MADALGO-Aarhus) 2 , IT University of Copenhagen SODA 2015, San Diego

  2. Approximate Range Emptiness 0 x 1 x 2 x i x n U Input Input a set S of n elements from [ U ]. M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 2 / 20

  3. Approximate Range Emptiness Query Empty? 0 x 1 x 2 x i x n U Input Input a set S of n elements from [ U ]. Preprocess it to answer Query: [ a , b ]; is [ a , b ] ∩ S � = ∅ ? M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 3 / 20

  4. Motivation: Exact versus Approximate Membership Membership: Given a set S = { x 1 , · · · , x n } from a universe [ U ], preprocess the set to answer membership queries for a queried element q ( q ∈ S ?). 1 Currently 4757 citations! M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 4 / 20

  5. Motivation: Exact versus Approximate Membership Membership: Given a set S = { x 1 , · · · , x n } from a universe [ U ], preprocess the set to answer membership queries for a queried element q ( q ∈ S ?). � U � Minimum space required B = lg bits. n There exist data structures using B + o ( B ) bits and O (1) query time. 1 Currently 4757 citations! M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 4 / 20

  6. Motivation: Exact versus Approximate Membership Membership: Given a set S = { x 1 , · · · , x n } from a universe [ U ], preprocess the set to answer membership queries for a queried element q ( q ∈ S ?). � U � Minimum space required B = lg bits. n There exist data structures using B + o ( B ) bits and O (1) query time. Reduction in space if we only want ǫ -approximate answers? 1 Currently 4757 citations! M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 4 / 20

  7. Motivation: Exact versus Approximate Membership Membership: Given a set S = { x 1 , · · · , x n } from a universe [ U ], preprocess the set to answer membership queries for a queried element q ( q ∈ S ?). � U � Minimum space required B = lg bits. n There exist data structures using B + o ( B ) bits and O (1) query time. Reduction in space if we only want ǫ -approximate answers? Yes. Bloom Filters 1 O ( n lg(1 /ǫ ) space, O ( k ) query. FPR ǫ . Here k is the number of hash functions used, and depends on ǫ . Optimal Bloom Filters (Pagh et. al.): Query time O (1) irrespective of ǫ and space usage (1 + o (1)) n lg(1 /ǫ ). 1 Currently 4757 citations! M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 4 / 20

  8. Approximate Range Emptiness Range queries are more frequent in real life than membership queries. � U � Range emptiness: Minimum space required B = lg bits. n Follows from membership. Alstrup et. al.: O ( n ) words = O ( n lg U ) bits, O ( k ) reporting, where k is the number of reported points. Can also do emptiness (does there exist a point inside [ a , b ]?) in O (1) time (stop at the first reported point). M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 5 / 20

  9. Approximate Range Emptiness Range queries are more frequent in real life than membership queries. � U � Range emptiness: Minimum space required B = lg bits. n Follows from membership. Alstrup et. al.: O ( n ) words = O ( n lg U ) bits, O ( k ) reporting, where k is the number of reported points. Can also do emptiness (does there exist a point inside [ a , b ]?) in O (1) time (stop at the first reported point). Approximate range emptiness (ARE): False negatives not allowed. A fraction ǫ of false positives allowed. Of all the u 2 / 2 range queries, only an ǫ fraction may have false positives. M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 5 / 20

  10. Main Question Can we reduce space usage for range queries to something lower than n lg U , by requiring approximate answers, similar to membership versus approximate membership queries? M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 6 / 20

  11. One way to do ARE Let us say we want a data structure that answers only to ranges of size at most L < U One way to do approx. range emptiness query on [ a , b ] is to Build a Bloom Filter on S with FPR ǫ/ L . For every x ∈ [ a , b ], run a membership query on the Bloom Filter. By a union bound, the false positive rate is at most ǫ . M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 7 / 20

  12. One way to do ARE Let us say we want a data structure that answers only to ranges of size at most L < U One way to do approx. range emptiness query on [ a , b ] is to Build a Bloom Filter on S with FPR ǫ/ L . For every x ∈ [ a , b ], run a membership query on the Bloom Filter. By a union bound, the false positive rate is at most ǫ . This uses space n lg( L /ǫ ). Achieves a query time of O ( r ), where r is the size of the range. M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 7 / 20

  13. One way to do ARE Let us say we want a data structure that answers only to ranges of size at most L < U One way to do approx. range emptiness query on [ a , b ] is to Build a Bloom Filter on S with FPR ǫ/ L . For every x ∈ [ a , b ], run a membership query on the Bloom Filter. By a union bound, the false positive rate is at most ǫ . This uses space n lg( L /ǫ ). Achieves a query time of O ( r ), where r is the size of the range. M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 7 / 20

  14. Results: Lower Bounds M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 8 / 20

  15. Lower Bounds We first show that the space error tradeoff cannot be improved significantly. Theorem Any data structure for the ARE problem answering all query intervals of a fixed length L ≤ u / 5 n with false positive rate ε > 0 , must use at least � L 1 − O ( ε ) � s ≥ n lg − O ( n ) ε bits of space. M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 9 / 20

  16. Extension to Two Sided Errors Theorem Any data structure for ARE with two sided error rate ǫ must use s ≥ n lg( L /ε ) − O ( n ) bits when 0 < ε < 1 / lg U , � � n lg( L lg U ) lg U ≤ ε ≤ 1 1 s = Ω bits when 2 − Ω(1) lg 1 /ε lg U M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 10 / 20

  17. Results: Upper Bounds M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 11 / 20

  18. Upper Bounds There is a data structure D a for the ARE problem that answers range emptiness for all ranges of length at most L , uses n lg( L /ε ) + O ( n lg δ ( L /ε )) bits of space, δ any desired constant, and has a false positive probability at most ǫ . 2 the previous best used O ( n lg U ) bits. M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 12 / 20

  19. Upper Bounds There is a data structure D a for the ARE problem that answers range emptiness for all ranges of length at most L , uses n lg( L /ε ) + O ( n lg δ ( L /ε )) bits of space, δ any desired constant, and has a false positive probability at most ǫ . A data structure D e that uses n lg( U / n ) + o ( n lg δ U / n ) bits 2 , answers exact range reporting in O ( k ) and exact emptiness in O (1) time, respectively. 2 the previous best used O ( n lg U ) bits. M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 12 / 20

  20. Upper Bounds: Reduction of Universe f : [ U ] → [ R ], where R = nL /ǫ M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 13 / 20

  21. Upper Bounds: Reduction of Universe f : [ U ] → [ R ], where R = nL /ǫ On [ R ] we use the exact range emptiness/reporting data structure. M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 13 / 20

  22. Upper Bounds: Reduction of Universe f : [ U ] → [ R ], where R = nL /ǫ On [ R ] we use the exact range emptiness/reporting data structure. This would give us constant query time in n lg( R / n ) + n lg δ ( R / n ), or n lg( L /ǫ ) + n lg δ ( L /ǫ ) bits, which would be optimal. M. Goswami, A. Grønlund, K. Larsen, R. Pagh (Max-Planck Institute for Informatics) Approximate Range Membership SODA 2015, San Diego 13 / 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend