moment based quantile sketches for efficient aggregation
play

Moment-Based Quantile Sketches for Efficient Aggregation Queries - PowerPoint PPT Presentation

Moment-Based Quantile Sketches for Efficient Aggregation Queries Edward Gan, Jialin Ding, Kai Sheng Tai, Vatsal Sharan, Peter Bailis Stanford University 1 Motivation: Monitoring production data streams Billions of events / day of mobile app


  1. Moment-Based Quantile Sketches for Efficient Aggregation Queries Edward Gan, Jialin Ding, Kai Sheng Tai, Vatsal Sharan, Peter Bailis Stanford University 1

  2. Motivation: Monitoring production data streams Billions of events / day of mobile app telemetry data Android iOS Query for 99-th percentile Group By Operating system p99 latency p99 latency Where Location = USA Quantile Query time time Spike in response latency, need to issue queries: Percentiles are targeted: single metric, for specific sub-populations 2

  3. <latexit sha1_base64="eIC5UG1f97dR+q+EpifAbGjpKek=">ACKXicbZBNS8NAEIY3ftb6VfXoZbGI9WBJiqAXoSiIRwWrQlPLZLtpl242YXcilNC/48W/4kVBUa/+EbdtDn4NLDy87wyz8waJFAZd92Zmp6ZnZsvLBQXl5ZXVktr61cmTjXjDRbLWN8EYLgUijdQoOQ3ieYQBZJfB/2TkX9x7URsbrEQcJbEXSVCAUDtFK7VPexF26BGdEN2jfqiBZb6CQEJ2OqxMjN1hLt3WvovtUtmtuOif8HLoUzyOm+Xnv1OzNKIK2QSjGl6boKtDQKJvmw6KeGJ8D60OVNiwoiblrZ+NIh3bZKh4axtk8hHavfJzKIjBlEge2MAHvmtzcS/OaKYaHrUyoJEWu2GRmEqKMR3FRjtCc4ZyYAGYFvavlPXA5oQ23KINwft98l+4qlU9t+pd7Jfrx3kcBbJtkiFeOSA1MkZOScNwsg9eSQv5NV5cJ6cN+dj0jrl5DMb5Ec5n1+r2qZO</latexit> <latexit sha1_base64="eIC5UG1f97dR+q+EpifAbGjpKek=">ACKXicbZBNS8NAEIY3ftb6VfXoZbGI9WBJiqAXoSiIRwWrQlPLZLtpl242YXcilNC/48W/4kVBUa/+EbdtDn4NLDy87wyz8waJFAZd92Zmp6ZnZsvLBQXl5ZXVktr61cmTjXjDRbLWN8EYLgUijdQoOQ3ieYQBZJfB/2TkX9x7URsbrEQcJbEXSVCAUDtFK7VPexF26BGdEN2jfqiBZb6CQEJ2OqxMjN1hLt3WvovtUtmtuOif8HLoUzyOm+Xnv1OzNKIK2QSjGl6boKtDQKJvmw6KeGJ8D60OVNiwoiblrZ+NIh3bZKh4axtk8hHavfJzKIjBlEge2MAHvmtzcS/OaKYaHrUyoJEWu2GRmEqKMR3FRjtCc4ZyYAGYFvavlPXA5oQ23KINwft98l+4qlU9t+pd7Jfrx3kcBbJtkiFeOSA1MkZOScNwsg9eSQv5NV5cJ6cN+dj0jrl5DMb5Ec5n1+r2qZO</latexit> <latexit sha1_base64="eIC5UG1f97dR+q+EpifAbGjpKek=">ACKXicbZBNS8NAEIY3ftb6VfXoZbGI9WBJiqAXoSiIRwWrQlPLZLtpl242YXcilNC/48W/4kVBUa/+EbdtDn4NLDy87wyz8waJFAZd92Zmp6ZnZsvLBQXl5ZXVktr61cmTjXjDRbLWN8EYLgUijdQoOQ3ieYQBZJfB/2TkX9x7URsbrEQcJbEXSVCAUDtFK7VPexF26BGdEN2jfqiBZb6CQEJ2OqxMjN1hLt3WvovtUtmtuOif8HLoUzyOm+Xnv1OzNKIK2QSjGl6boKtDQKJvmw6KeGJ8D60OVNiwoiblrZ+NIh3bZKh4axtk8hHavfJzKIjBlEge2MAHvmtzcS/OaKYaHrUyoJEWu2GRmEqKMR3FRjtCc4ZyYAGYFvavlPXA5oQ23KINwft98l+4qlU9t+pd7Jfrx3kcBbJtkiFeOSA1MkZOScNwsg9eSQv5NV5cJ6cN+dj0jrl5DMb5Ec5n1+r2qZO</latexit> <latexit sha1_base64="eIC5UG1f97dR+q+EpifAbGjpKek=">ACKXicbZBNS8NAEIY3ftb6VfXoZbGI9WBJiqAXoSiIRwWrQlPLZLtpl242YXcilNC/48W/4kVBUa/+EbdtDn4NLDy87wyz8waJFAZd92Zmp6ZnZsvLBQXl5ZXVktr61cmTjXjDRbLWN8EYLgUijdQoOQ3ieYQBZJfB/2TkX9x7URsbrEQcJbEXSVCAUDtFK7VPexF26BGdEN2jfqiBZb6CQEJ2OqxMjN1hLt3WvovtUtmtuOif8HLoUzyOm+Xnv1OzNKIK2QSjGl6boKtDQKJvmw6KeGJ8D60OVNiwoiblrZ+NIh3bZKh4axtk8hHavfJzKIjBlEge2MAHvmtzcS/OaKYaHrUyoJEWu2GRmEqKMR3FRjtCc4ZyYAGYFvavlPXA5oQ23KINwft98l+4qlU9t+pd7Jfrx3kcBbJtkiFeOSA1MkZOScNwsg9eSQv5NV5cJ6cN+dj0jrl5DMb5Ec5n1+r2qZO</latexit> <latexit sha1_base64="JdL0ONVIGUSTCqe1v65tOHdGSA=">ACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiQi6EYounFZwT6giWEynbRDZyZhZiItIV/gxl9x40IRt67d+TdO2y09cCFwzn3cu89YcKo0o7zbZVWVtfWN8qbla3tnd09e/+greJUYtLCMYtlN0SKMCpIS1PNSDeRBPGQkU4up76nQciFY3FnZ4kxOdoIGhEMdJGCuyax9OAwkvoRLhzM0zkXsq5UE2h4VsJvD8T0N7KpTd2aAy8QtSBUaAb2l9ePcqJ0JghpXquk2g/Q1JTzEhe8VJFEoRHaEB6hgrEifKz2Ts5rBmlD6NYmhIaztTfExniSk14aDo50kO16E3F/7xeqMLP6MiSTUReL4oShnUMZxmA/tUEqzZxBCEJTW3QjxEJhdtEqyYENzFl5dJ+7TuOnX39qzauCriKIMjcAxOgAvOQPcgCZoAQwewTN4BW/Wk/VivVsf89aSVcwcgj+wPn8Am3ab0Q=</latexit> <latexit sha1_base64="JdL0ONVIGUSTCqe1v65tOHdGSA=">ACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiQi6EYounFZwT6giWEynbRDZyZhZiItIV/gxl9x40IRt67d+TdO2y09cCFwzn3cu89YcKo0o7zbZVWVtfWN8qbla3tnd09e/+greJUYtLCMYtlN0SKMCpIS1PNSDeRBPGQkU4up76nQciFY3FnZ4kxOdoIGhEMdJGCuyax9OAwkvoRLhzM0zkXsq5UE2h4VsJvD8T0N7KpTd2aAy8QtSBUaAb2l9ePcqJ0JghpXquk2g/Q1JTzEhe8VJFEoRHaEB6hgrEifKz2Ts5rBmlD6NYmhIaztTfExniSk14aDo50kO16E3F/7xeqMLP6MiSTUReL4oShnUMZxmA/tUEqzZxBCEJTW3QjxEJhdtEqyYENzFl5dJ+7TuOnX39qzauCriKIMjcAxOgAvOQPcgCZoAQwewTN4BW/Wk/VivVsf89aSVcwcgj+wPn8Am3ab0Q=</latexit> <latexit sha1_base64="JdL0ONVIGUSTCqe1v65tOHdGSA=">ACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiQi6EYounFZwT6giWEynbRDZyZhZiItIV/gxl9x40IRt67d+TdO2y09cCFwzn3cu89YcKo0o7zbZVWVtfWN8qbla3tnd09e/+greJUYtLCMYtlN0SKMCpIS1PNSDeRBPGQkU4up76nQciFY3FnZ4kxOdoIGhEMdJGCuyax9OAwkvoRLhzM0zkXsq5UE2h4VsJvD8T0N7KpTd2aAy8QtSBUaAb2l9ePcqJ0JghpXquk2g/Q1JTzEhe8VJFEoRHaEB6hgrEifKz2Ts5rBmlD6NYmhIaztTfExniSk14aDo50kO16E3F/7xeqMLP6MiSTUReL4oShnUMZxmA/tUEqzZxBCEJTW3QjxEJhdtEqyYENzFl5dJ+7TuOnX39qzauCriKIMjcAxOgAvOQPcgCZoAQwewTN4BW/Wk/VivVsf89aSVcwcgj+wPn8Am3ab0Q=</latexit> <latexit sha1_base64="JdL0ONVIGUSTCqe1v65tOHdGSA=">ACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiQi6EYounFZwT6giWEynbRDZyZhZiItIV/gxl9x40IRt67d+TdO2y09cCFwzn3cu89YcKo0o7zbZVWVtfWN8qbla3tnd09e/+greJUYtLCMYtlN0SKMCpIS1PNSDeRBPGQkU4up76nQciFY3FnZ4kxOdoIGhEMdJGCuyax9OAwkvoRLhzM0zkXsq5UE2h4VsJvD8T0N7KpTd2aAy8QtSBUaAb2l9ePcqJ0JghpXquk2g/Q1JTzEhe8VJFEoRHaEB6hgrEifKz2Ts5rBmlD6NYmhIaztTfExniSk14aDo50kO16E3F/7xeqMLP6MiSTUReL4oShnUMZxmA/tUEqzZxBCEJTW3QjxEJhdtEqyYENzFl5dJ+7TuOnX39qzauCriKIMjcAxOgAvOQPcgCZoAQwewTN4BW/Wk/VivVsf89aSVcwcgj+wPn8Am3ab0Q=</latexit> Goal: Enabling fast quantile queries at scale Query for 99-th percentile Users expect Large Group By Operating System interactive response Billions of Datasets events per day Where Location = USA Baseline: Scan and sort billions of rows, multi-second latencies µ i = 1 + θ 0 = θ � r F ( θ ) = + X x i Scalable Queries n r 2 F ( θ ) x ∈ X Statistics Optimization Data Summaries 3

  4. Systems make use of summaries to scale Summary Quantiles Raw values 99-th percentile: 401ms 95-th percentile: 197ms 50-th percentile: 48ms Summaries represent a dataset using sublinear space (e.g. histogram) Quantile estimates can be extracted from a quantile summary Commonly used to avoid sorting large datasets 4

  5. Pre-aggregating summaries reduces latency Systems can pre-aggregate summaries for populations ahead of time Data associated with day of week Day=Weekend Day=Sat 99-th percentile: 105ms 95-th percentile: 87ms 50-th percentile: 40ms Day=Sun Mergeable summaries 1 can be combined without loss of accuracy Improved query response time 1: [Agarwal et al, PODS ‘12] 5

  6. Challenge: aggregations bottlenecked by merge Many attributes means potentially more pre-aggregated subpopulations × × × × App Version OS Version Location Day HW Make 5 columns x 20 distinct values each = 3.2M combinations Queries bottlenecked when merging pre-aggregated summaries Greenwald Khanna Sketch: updatable equi-depth histogram GK Performance: 3 µ s x 1 million merges = 3 seconds How can we optimize quantile summaries for aggregation? 6

  7. Talk Outline 1. Setting: Quantile roll-ups at scale 2. Challenge: merging pre-aggregated summaries 3. Summarizing data using statistics (moments sketch) 4. Improving sketch performance 5. Results: benchmark + integrated into data systems 7

  8. Efficient data summaries using statistics How can we optimize quantile summaries for aggregation? Use statistics to summarize sub-populations (indexing) 1 3 2 4 2 2 Aggregate statistics using arithmetic (query time) 10 1 3 2 4 2 2 8

  9. <latexit sha1_base64="BDk/+7hWptUECFJ5XHazKJjmM=">ACD3icbVDLSsNAFJ3UV62vqEs3g0VxVRIRdCMU3bisYB/QxDCZTtqhM5MwM5GWkD9w46+4caGIW7fu/BunbRbaeuDC4Zx7ufeMGFUacf5tkpLyura+X1ysbm1vaOvbvXUnEqMWnimMWyEyJFGBWkqalmpJNIgnjISDscXk/89gORisbiTo8T4nPUFzSiGkjBfax9OAwkvoRLhzM0zkUNPpTzIRtCjAnZyOLqngV1as4UcJG4BamCAo3A/vJ6MU45ERozpFTXdRLtZ0hqihnJK16qSILwEPVJ1CBOF+Nv0nh0dG6cEolqaEhlP190SGuFJjHpOjvRAzXsT8T+vm+rows+oSFJNBJ4tilIGdQwn4cAelQRrNjYEYUnNrRAPkAlGmwgrJgR3/uVF0jqtuU7NvT2r1q+KOMrgAByCE+Cc1AHN6ABmgCDR/AMXsGb9WS9WO/Wx6y1ZBUz+APrM8f+oub+w=</latexit> <latexit sha1_base64="BDk/+7hWptUECFJ5XHazKJjmM=">ACD3icbVDLSsNAFJ3UV62vqEs3g0VxVRIRdCMU3bisYB/QxDCZTtqhM5MwM5GWkD9w46+4caGIW7fu/BunbRbaeuDC4Zx7ufeMGFUacf5tkpLyura+X1ysbm1vaOvbvXUnEqMWnimMWyEyJFGBWkqalmpJNIgnjISDscXk/89gORisbiTo8T4nPUFzSiGkjBfax9OAwkvoRLhzM0zkUNPpTzIRtCjAnZyOLqngV1as4UcJG4BamCAo3A/vJ6MU45ERozpFTXdRLtZ0hqihnJK16qSILwEPVJ1CBOF+Nv0nh0dG6cEolqaEhlP190SGuFJjHpOjvRAzXsT8T+vm+rows+oSFJNBJ4tilIGdQwn4cAelQRrNjYEYUnNrRAPkAlGmwgrJgR3/uVF0jqtuU7NvT2r1q+KOMrgAByCE+Cc1AHN6ABmgCDR/AMXsGb9WS9WO/Wx6y1ZBUz+APrM8f+oub+w=</latexit> <latexit sha1_base64="BDk/+7hWptUECFJ5XHazKJjmM=">ACD3icbVDLSsNAFJ3UV62vqEs3g0VxVRIRdCMU3bisYB/QxDCZTtqhM5MwM5GWkD9w46+4caGIW7fu/BunbRbaeuDC4Zx7ufeMGFUacf5tkpLyura+X1ysbm1vaOvbvXUnEqMWnimMWyEyJFGBWkqalmpJNIgnjISDscXk/89gORisbiTo8T4nPUFzSiGkjBfax9OAwkvoRLhzM0zkUNPpTzIRtCjAnZyOLqngV1as4UcJG4BamCAo3A/vJ6MU45ERozpFTXdRLtZ0hqihnJK16qSILwEPVJ1CBOF+Nv0nh0dG6cEolqaEhlP190SGuFJjHpOjvRAzXsT8T+vm+rows+oSFJNBJ4tilIGdQwn4cAelQRrNjYEYUnNrRAPkAlGmwgrJgR3/uVF0jqtuU7NvT2r1q+KOMrgAByCE+Cc1AHN6ABmgCDR/AMXsGb9WS9WO/Wx6y1ZBUz+APrM8f+oub+w=</latexit> <latexit sha1_base64="BDk/+7hWptUECFJ5XHazKJjmM=">ACD3icbVDLSsNAFJ3UV62vqEs3g0VxVRIRdCMU3bisYB/QxDCZTtqhM5MwM5GWkD9w46+4caGIW7fu/BunbRbaeuDC4Zx7ufeMGFUacf5tkpLyura+X1ysbm1vaOvbvXUnEqMWnimMWyEyJFGBWkqalmpJNIgnjISDscXk/89gORisbiTo8T4nPUFzSiGkjBfax9OAwkvoRLhzM0zkUNPpTzIRtCjAnZyOLqngV1as4UcJG4BamCAo3A/vJ6MU45ERozpFTXdRLtZ0hqihnJK16qSILwEPVJ1CBOF+Nv0nh0dG6cEolqaEhlP190SGuFJjHpOjvRAzXsT8T+vm+rows+oSFJNBJ4tilIGdQwn4cAelQRrNjYEYUnNrRAPkAlGmwgrJgR3/uVF0jqtuU7NvT2r1q+KOMrgAByCE+Cc1AHN6ABmgCDR/AMXsGb9WS9WO/Wx6y1ZBUz+APrM8f+oub+w=</latexit> Moments: statistics that capture distribution shape Moments: averages of powers of the data values. i th moment: µ i = 1 The first moment is the mean. X x i n x ∈ X Intuition: Averages bound the number of “large” values " ∑ $ % = 1 limits size of the tail ! " ∑ $ ( = 6 further limits size of tail ! Given * moments, distribution known to within +(1/*) , Can estimate quantiles 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend