a study on workload aware wavelet synopses for point and
play

A Study on Workload-Aware Wavelet Synopses for Point and Range Sum - PowerPoint PPT Presentation

A Study on Workload-Aware Wavelet Synopses for Point and Range Sum Queries Michael Mathioudakis , mathiou@cs.toronto.edu Dimitris Sacharidis, dsachar@dblab.ntua.gr Timos Sellis, timos@dblab.ntua.gr DOLAP 2006 Outline Introduction


  1. A Study on Workload-Aware Wavelet Synopses for Point and Range Sum Queries Michael Mathioudakis , mathiou@cs.toronto.edu Dimitris Sacharidis, dsachar@dblab.ntua.gr Timos Sellis, timos@dblab.ntua.gr DOLAP 2006

  2. Outline • Introduction • Wavelets • Error Metrics • Algorithms for Point Errors • Algorithms for Range Sum Errors • Experimental Results

  3. Introduction • Approximate Query Processing over Synopses: An effective approach to manage large data sets (eg OLAP queries) 1. Query optimization process - Provide highly accurate query selectivity estimates 2. Can be used instead of the actual data - Provide quick approximate answers to large queries • Workload-Awareness: Take user behavior under consideration - More accuracy for important data - workload aware synopses • Histograms, Wavelet Transformation : Commonly Used Synopses construction techniques

  4. Introduction - Our Contribution • Focus on wavelet synopsis construction algorithms • Theoretical presentation of existing algorithms • Presentation of a novel workload-aware algorithm for range- sum queries • Experimental study - Accuracy vs Time Efficiency

  5. Outline • Introduction • Wavelets • Error Metrics • Algorithms for Point Errors • Algorithms for Range Sum Errors • Experimental Results

  6. Wavelet Preliminaries • It’s a transformation! +,-.-!/01!.! !# !" !% !& !' !( !) !$ 2!34/4.05647 ! *-4,.8 *# *" *% *& *' *( *) *$ 944:06,/;0<==>0670.?4@A • Histograms: Construct Buckets on Initial Data - Assign one value per bucket Initial Data a1 a2 a3 a4 a5 a6 a7 a8 Bucket 1 Bucket 2 Bucket 3

  7. Wavelet Preliminaries Haar W/T: recursive pairwise calculation of averages and semi- differences (details) 11/4 = (3/2 +4)/2 -5/4 = (3/2 - 4)/2 11/4 pairwise pairwise averages details -5/4 3/2 4 2 1/2 4 1 4 0 0 -1 -1 0 2 2 0 2 3 5 4 4

  8. Wavelet Preliminaries • Initial values can be reconstructed in logarithmic time • Similar values for near data - small details • Coefficients near the root are more important - normalization needed 11/4 O(logN) coeffs + needed + - -5/4 1/2 0 - - + + 0 -1 -1 0 + + + - - - - + 2 2 0 2 3 5 4 4

  9. Wavelet Synopses Keep B coefficients - Dropped coefficients are considered zero - Error introduced to the values of our data 11/4 + + -5/4 - 1/2 0 + + - - 0 -1 -1 0 - - - - + + + + 2 2 0 2 3 5 4 4 2 2 1 1 4 4 4 4 Point Error = 1 Range Sum Error = 1

  10. Outline • Introduction • Wavelets • Error Metrics • Algorithms for Point Errors • Algorithms for Range Sum Errors • Experimental Results

  11. Error Metrics • Weighted Error Metrics • For point queries :L wp = Σ i w[i]e[i] p • For range sum queries: L wp = Σ i ≤ j w[i,j]e[i:j] p Initial Values 0 4 2 -2 8 2 3 -1 After Synopsis -1 3 3 -1 3 3 5 1 Point Errors 1 1 -1 -1 5 -1 -2 -2 Range Sum Error(2:5) = 4

  12. Outline • Introduction • Wavelets • Error Metrics • Algorithms for Point Errors • Algorithms for Range Sum Errors • Experimental Results

  13. Classic Algorithm • Minimizes L 2 of point errors • Selects the B largest normalized coeffs, using a heap • Complexity: O(N) space, O(N+BlogN) time 11/4 + + - -5/4 1/2 0 + - + - 0 -1 -1 0 + + + - - - - + 2 2 0 2 3 5 4 4

  14. Garofalakis - Kumar • Minimizes Weighted Error Metrics • Dynamic Programming Algorithm on transformation’s tree • Complexity: O(N 2 ) Space, O(N 2 logB) Time Already Kept Coefficients B coefficients available K B-K weights

  15. Matias-Urieli • Minimizes L w2 of point errors • Using a modified Haar wavelet transformation, then apply the classic algorithm • Complexity: O(N) space, O(N+B log N) time Weighted Average Weighted Difference w2 w1

  16. Outline • Introduction • Wavelets • Error Metrics • Algorithms for Point Errors • Algorithms for Range Sum Errors • Experimental Results

  17. Matias - Urieli • Minimizes L 2 - Complexity: O(N) space, O(N+BlogN) time • Working with prefix sums has disadvantages: sparse data become dense, difficult to update Haar Transformation Greedily Pick the On The Prefix Sums Largest B Coeffs 2 2 0 4 3 7 5 5 Prefix Sums 2 0 -2 4 -1 4 -2 0 Raw Data

  18. RangeWave range-sum query workload • Minimizes Weighted-L p of range sum queries, that follow a dyadic hierarchy • Workload Aware - Applies on Raw Data Dyadic Ranges Hierarchy Raw Data

  19. RangeWave • A Dynamic Programming Algorithm • Complexity: O(N 2 logB) time, O(N 2 ) space Already Kept Coefficients Compute the error for the corresponding dyadic B coeffs interval available i Weight W[i] B-K coeffs K coeffs Raw Data

  20. Outline • Introduction • Wavelets • Error Metrics • Algorithms for Point Errors • Algorithms for Range Sum Errors • Experimental Results

  21. Algorithms Summary Point Query Workload Algorithm Time Space Optimal Matias - Urieli N+B log N N Yes Garofalakis - N2 log B N2 Yes Kumar Classic Wavelets N+B log N N No Classic N2B NB Yes Histograms Dyadic Range Sum Query Workload Algorithm Time Space Optimal RangeWave N2 log B N2 Yes Koudas- N7B2 N5B Yes Muthukrishnan Only for uniform Matias - Urieli N+B log N N workload Classic N+B log N N No

  22. Experimental Study Point-Query Workloads • Data and Point Workload follow Zipfian distribution • Increasing Synopsis Size • Urieli-Matias provides the best trade-off between accuracy (weighted L 2 error) and running time

  23. Experimental Study Unbiased Dyadic Range Sum Query Workload • RangeWave exhibits significant accuracy gains as the synopsis size increases for this workload • Classic still performs well

  24. Experimental Study Biased Dyadic Range Sum Query Workload • Biased Workload : Assigns more significance to larger range-sum queries • The accuracy of RangeWave is orders of magnitude higher

  25. Conclusions • Point Query Workloads: You Get What You Pay Quadratic algorithms outperform linear ones in accuracy, at a high price • Range Sum Query Workloads: We can do better Find a linear time algorithm for all Range Sum Queries Extend RangeWave to general hierarchy of queries

  26. Thank You

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend