Processing Complex Aggregate Queries
- ver Data Streams
Processing Complex Aggregate Queries over Data Streams SIGMOD 2002 - - PowerPoint PPT Presentation
Processing Complex Aggregate Queries over Data Streams SIGMOD 2002 Alin Dobra Minos Garofalakis Johannes Gehrke Rajeev Rastogi June 4, 2002 Processing Network Data Streams DataStream Join Query Network Operations Center SELECT COUNT(*)
Telco/LAN Router Telco/LAN Router Telco/LAN Router Telco/LAN Router Telco/LAN Router Telco/LAN Router
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
Sketch for R1 Sketch for R2 Sketch for Rr Memory Stream for R1 Stream for R2 Stream for Rr Stream Engine Query Q(R1,...,Rr) Approximate answer to Q Query-Processing
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
ǫ2L2
E pairwise independent samples of X
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
−1 1 −1 1 −1 1
1 2 h Data
232 − 1 ξ family of random variables Uniform random seed space (size 265)
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
i=1 figi = 3 · 3 + 1 · 0 + 2 · 2 = 13 Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
t∈F ξt.a
t∈G ξt.a
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
i=1 figi define:
n
n
i +
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
i=1 fi
t∈F ξt.a
t∈G t.b ξt.a
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
i=1 fi
n
n
n
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
i
j
t.a
t.a
t.a
t.b
t.aξb t.b
t.b
t.b
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
n1
n2
n1
i ,
n1
n2
i ξb j,
n2
j
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
n1
nm
r
r
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
i
i .
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
actual−approx | actual
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
10 20 30 40 50 60 70 80 90 100 500 1000 1500 2000 2500 3000 3500 4000 Relative error(%) Memory(words) sketch histogram
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
2 4 6 8 10 12 500 1000 1500 2000 2500 3000 3500 4000 Relative error(%) Memory(words) sketch histogram
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
2 4 6 8 10 12 14 16 2000 4000 6000 8000 10000 12000 Relative Error(%) Memory(words) sketch histogram
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 Relative error(%) Number of partitions 25 buckets 50 buckets 100 buckets
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
20 40 60 80 100 120 140 160 180 2 4 6 8 10 12 14 16 Relative error(%) Number of partitions 25 buckets 50 buckets 100 buckets
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams
Dobra, Garofalkis, Gehrke and Rastogi – Processing Aggregate Queries over Streams