tight lower bound for comparison based quantile summaries
play

Tight Lower Bound for Comparison-Based Quantile Summaries Pavel - PowerPoint PPT Presentation

Tight Lower Bound for Comparison-Based Quantile Summaries Pavel Vesel y University of Warwick 8 April 2020 Based on joint work with Graham Cormode (Warwick) Powered by Beamer i k Z Overview of the talk & Quantiles & Distributions


  1. Tight Lower Bound for Comparison-Based Quantile Summaries Pavel Vesel´ y University of Warwick 8 April 2020 Based on joint work with Graham Cormode (Warwick) Powered by Beamer i k Z

  2. Overview of the talk & Quantiles & Distributions Big Data Algorithms 1 0 . 5 Streaming Model 0 median Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 1 / 10

  3. Motivation: Monitoring Latencies of Web Requests Source: C. Masson, J.E. Rim, and H.K. Lee. Ddsketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12):2195–2205, 2019. Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 2 / 10

  4. Motivation: Monitoring Latencies of Web Requests Source: C. Masson, J.E. Rim, and H.K. Lee. Ddsketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12):2195–2205, 2019. Millions of observations • no need to store all observed latencies Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 2 / 10

  5. Motivation: Monitoring Latencies of Web Requests Source: C. Masson, J.E. Rim, and H.K. Lee. Ddsketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12):2195–2205, 2019. Millions of observations • no need to store all observed latencies How does the distribution look like? What is the median latency? Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 2 / 10

  6. Motivation: Monitoring Latencies of Web Requests Source: C. Masson, J.E. Rim, and H.K. Lee. Ddsketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12):2195–2205, 2019. Millions of observations • no need to store all observed latencies How does the distribution look like? What is the median latency? • Average latency too high due to ∼ 2% of very high latencies Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 2 / 10

  7. Streaming Model Motivation: monitoring latencies of requests Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 3 / 10

  8. Streaming Model Motivation: monitoring latencies of requests Streaming model = one pass over data & limited memory Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 3 / 10

  9. Streaming Model Motivation: monitoring latencies of requests Streaming model = one pass over data & limited memory Streaming algorithm • receives data in a stream, item by item • uses memory sublinear in N = stream length • at the end, computes the answer Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 3 / 10

  10. Streaming Model Motivation: monitoring latencies of requests Streaming model = one pass over data & limited memory Streaming algorithm • receives data in a stream, item by item • uses memory sublinear in N = stream length • at the end, computes the answer Challenges: • N very large & not known • Data independent • Stream ordered arbitrarily • No random access to data Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 3 / 10

  11. Streaming Model Motivation: monitoring latencies of requests Streaming model = one pass over data & limited memory Streaming algorithm • receives data in a stream, item by item • uses memory sublinear in N = stream length • at the end, computes the answer Challenges: • N very large & not known • Data independent • Stream ordered arbitrarily • No random access to data Main objective: space Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 3 / 10

  12. Streaming Model Motivation: monitoring latencies of requests Streaming model = one pass over data & limited memory Streaming algorithm • receives data in a stream, item by item • uses memory sublinear in N = stream length • at the end, computes the answer Challenges: • N very large & not known • Data independent • Stream ordered arbitrarily • No random access to data Main objective: space How to summarize the input? Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 3 / 10

  13. Selection Problem & Streaming • Input: stream of N numbers • Goal: find the k -th smallest • e.g.: the median, 99th percentile • O ( N ) time offline algorithm [Blum et al. ’73] Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 4 / 10

  14. Selection Problem & Streaming • Input: stream of N numbers • Goal: find the k -th smallest • e.g.: the median, 99th percentile • O ( N ) time offline algorithm [Blum et al. ’73] • Streaming restrictions: • just one pass over the data • limited memory: o ( N ) Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 4 / 10

  15. Selection Problem & Streaming • Input: stream of N numbers • Goal: find the k -th smallest • e.g.: the median, 99th percentile • O ( N ) time offline algorithm [Blum et al. ’73] • Streaming restrictions: • just one pass over the data • limited memory: o ( N ) No streaming algorithm for exact selection Ω( N ) space needed to find the median [Munro & Paterson ’80, Guha & McGregor ’07] Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 4 / 10

  16. Selection Problem & Streaming • Input: stream of N numbers • Goal: find the k -th smallest • e.g.: the median, 99th percentile • O ( N ) time offline algorithm [Blum et al. ’73] • Streaming restrictions: • just one pass over the data • limited memory: o ( N ) No streaming algorithm for exact selection Ω( N ) space needed to find the median [Munro & Paterson ’80, Guha & McGregor ’07] What about finding an approximate median? Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 4 / 10

  17. Approximate Median & Quantiles How to define an approximate median? Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 5 / 10

  18. Approximate Median & Quantiles How to define an approximate median? φ -quantile = ⌈ φ · N ⌉ -th smallest element ( φ ∈ [0 , 1]) • Median = .5-quantile Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 5 / 10

  19. Approximate Median & Quantiles How to define an approximate median? Sorted data φ -quantile = ⌈ φ · N ⌉ -th smallest element ( φ ∈ [0 , 1]) • Median = .5-quantile • Quartiles = .25, .5, and .75-quantiles • Percentiles = .01, .02, . . . , .99-quantiles .25-quantile median .75-quantile Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 5 / 10

  20. Approximate Median & Quantiles How to define an approximate median? Sorted data φ -quantile = ⌈ φ · N ⌉ -th smallest element ( φ ∈ [0 , 1]) • Median = .5-quantile • Quartiles = .25, .5, and .75-quantiles • Percentiles = .01, .02, . . . , .99-quantiles .25-quantile median .75-quantile ε -approximate φ -quantile = any φ ′ -quantile for φ ′ = [ φ − ε, φ + ε ] • . 01-approximate medians are . 49- and . 51-quantiles (and items in between) Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 5 / 10

  21. Approximate Median & Quantiles How to define an approximate median? Sorted data φ -quantile = ⌈ φ · N ⌉ -th smallest element ( φ ∈ [0 , 1]) • Median = .5-quantile • Quartiles = .25, .5, and .75-quantiles • Percentiles = .01, .02, . . . , .99-quantiles .25-quantile median .75-quantile ε -approximate φ -quantile = any φ ′ -quantile for φ ′ = [ φ − ε, φ + ε ] • . 01-approximate medians are . 49- and . 51-quantiles (and items in between) ε -approximate selection: • query k -th smallest → return k ′ -th smallest for k ′ = k ± ε N Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 5 / 10

  22. Approximate Median & Quantiles How to define an approximate median? Sorted data φ -quantile = ⌈ φ · N ⌉ -th smallest element ( φ ∈ [0 , 1]) • Median = .5-quantile • Quartiles = .25, .5, and .75-quantiles • Percentiles = .01, .02, . . . , .99-quantiles .25-quantile median .75-quantile ε -approximate φ -quantile = any φ ′ -quantile for φ ′ = [ φ − ε, φ + ε ] • . 01-approximate medians are . 49- and . 51-quantiles (and items in between) ε -approximate selection: • query k -th smallest → return k ′ -th smallest for k ′ = k ± ε N Offline summary: sort data & select ∼ 1 2 ε items R min. 2 ε -quantile 4 ε -quantile . . . (0-quantile) Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 5 / 10

  23. ε -Approximate Quantile Summaries Data structure with two operations: • Update( x ) : x = new item from the stream Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 6 / 10

  24. ε -Approximate Quantile Summaries Data structure with two operations: • Update( x ) : x = new item from the stream • Quantile Query( φ ) : For φ ∈ [0 , 1], return ε -approximate φ -quantile Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 6 / 10

  25. ε -Approximate Quantile Summaries Data structure with two operations: • Update( x ) : x = new item from the stream • Quantile Query( φ ) : For φ ∈ [0 , 1], return ε -approximate φ -quantile Additional operations: • Rank Query( x ) : • For item x , determine its rank = position in the ordering of the input Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 6 / 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend