SLIDE 1
Problem Statement
Given a large multiset with elements, count the number of distinct elements in . Alternatively, given samples from a distribution , estimate the 0-th frequency moment.
2
Distinct Value Estimators For Zipfian Distributions Sergei - - PowerPoint PPT Presentation
Distinct Value Estimators For Zipfian Distributions Sergei Vassilvitskii Rajeev Motwani Stanford University Problem Statement Given a large multiset with elements, count X n the number of distinct elements in . X X = { a, b,
2
3
4
5
6
7
8
9
10
11
12
r
D,θ[fr] = f ∗ r
13
14
r = E ˆ D,θ[fr]
r
15
16
17
18
19
20 40 60 80 100 2 4 6 8 10 Number of Samples x 1000 Ratio Error
Theta = 0.5, D = 50000
ZE AE GEE
, n = 1M
20
2 4 6 8 10 2 4 6 8 10 % DB Sampled Ratio Error
Router Dataset
ZE AE GEE
21
22