Lower Bounds for Quantile Estimation in Random-Order and Multi-Pass Streams
Sudipto Guha (UPenn) Andrew McGregor (UCSD)
Lower Bounds for Quantile Estimation in Random-Order and Multi-Pass - - PowerPoint PPT Presentation
Lower Bounds for Quantile Estimation in Random-Order and Multi-Pass Streams Sudipto Guha (UPenn) Andrew McGregor (UCSD) Data Stream Model Data Stream Model Stream: m elements from a universe of size n :
Sudipto Guha (UPenn) Andrew McGregor (UCSD)
3,5,3,7,5,4,8,5,3,7,5,4,8,6,3,2,6,4,7,3,4, ... e.g., IP packets, search engine queries, data read from external memory, device, sensor readings...
3,5,3,7,5,4,8,5,3,7,5,4,8,6,3,2,6,4,7,3,4, ... e.g., IP packets, search engine queries, data read from external memory, device, sensor readings...
No control over the ordering of elements Limited working memory S Limited time to process each element
[Morris ’78] [Munro, Paterson ’78] [Flajolet, Martin ’85] [Alon, Matias, Szegedy ’96] [Henzinger, Raghavan, Rajagopalan ’98] [Feigenbaum, Kannan, Strauss, Viswanathan ’99]
3,5,3,7,5,4,8,5,3,7,5,4,8,6,3,2,6,4,7,3,4, ... e.g., IP packets, search engine queries, data read from external memory, device, sensor readings...
No control over the ordering of elements Limited working memory S Limited time to process each element
[Morris ’78] [Munro, Paterson ’78] [Flajolet, Martin ’85] [Alon, Matias, Szegedy ’96] [Henzinger, Raghavan, Rajagopalan ’98] [Feigenbaum, Kannan, Strauss, Viswanathan ’99]
histograms, clustering, entropy, graph problems...
Form of average case analysis Stream of independent samples Uncorrelated fields in a database...
Form of average case analysis Stream of independent samples Uncorrelated fields in a database...
Frequent elements
[Demaine, Lopez-Ortiz, Munro ’02]
Entropy & Distances
[Guha, McGregor, Venkatasubramanian ’06]
Histograms
[Guha, McGregor ’07]
Quantiles...
[Munro, Paterson ’78], [Guha, McGregor ’06]
element of rank = m/2±t
element of rank = m/2±t
AOM: εm-approx in O(ε-1 lg εm) space
[Greenwald, Khanna ’01], [Shrivastava, Buragohain, Agrawal, Suri ‘04] [Cormode, Korn, Muthukrishnan, Srivastava ’06]
element of rank = m/2±t
AOM: εm-approx in O(ε-1 lg εm) space
[Greenwald, Khanna ’01], [Shrivastava, Buragohain, Agrawal, Suri ‘04] [Cormode, Korn, Muthukrishnan, Srivastava ’06]
ROM: 1-pass exact selection in O(m1/2) space
[Munro, Paterson ’78]
ROM: 1-pass m1/2+ε-approx in O(21/ε polylog m) space ROM: O(lg lg m)-pass selection in O(polylog m) space
[Guha, McGregor ’06]
element of rank = m/2±t
AOM: εm-approx in O(ε-1 lg εm) space
[Greenwald, Khanna ’01], [Shrivastava, Buragohain, Agrawal, Suri ‘04] [Cormode, Korn, Muthukrishnan, Srivastava ’06]
ROM: 1-pass exact selection in O(m1/2) space
[Munro, Paterson ’78]
ROM: 1-pass m1/2+ε-approx in O(21/ε polylog m) space ROM: O(lg lg m)-pass selection in O(polylog m) space
[Guha, McGregor ’06]
Are these ROM results possible in the AOM model? Can these ROM results be improved?
a) 1-pass, O(polylg m)-space, Õ(m1/2)-approx b) O(lg lg m)-pass, O(polylg m)-space exact selection
a) 1-pass, Õ(m1/2)-approx requires Ω(m1/2) space b) O(polylg m)-space exact requires Ω(lg m) passes
t-approx requires Ω(m1/2 t -3/2) space.
Value Stream Position
Value Stream Position
1) Maintain bounds [a,b] for median and c in [a,b]
Value Stream Position
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep
Value Stream Position
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b]
Value Stream Position
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b] Estimate rank(c) from Ei
Value Stream Position
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b] Estimate rank(c) from Ei Update [a,b]
Value Stream Position
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b] Estimate rank(c) from Ei Update [a,b]
S1 E1 S2 E2 S3 E3
Value
b a
Stream Position
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b] Estimate rank(c) from Ei Update [a,b]
S1 E1 S2 E2 S3 E3
Value
b a
Stream Position
c
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b] Estimate rank(c) from Ei Update [a,b]
S1 E1 S2 E2 S3 E3
Value
b a
Stream Position
c
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b] Estimate rank(c) from Ei Update [a,b]
S1 E1 S2 E2 S3 E3
Value
b a
Stream Position
c
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b] Estimate rank(c) from Ei Update [a,b]
S1 E1 S2 E2 S3 E3
Value
b a
Stream Position
c
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b] Estimate rank(c) from Ei Update [a,b]
S1 E1 S2 E2 S3 E3
Value
b a
Stream Position
c
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b] Estimate rank(c) from Ei Update [a,b]
S1 E1 S2 E2 S3 E3
Value
a b
Stream Position
c
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b] Estimate rank(c) from Ei Update [a,b]
S1 E1 S2 E2 S3 E3
Value
a b
Stream Position
c
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b] Estimate rank(c) from Ei Update [a,b]
S1 E1 S2 E2 S3 E3
Value
a b
Stream Position
c
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b] Estimate rank(c) from Ei Update [a,b]
S1 E1 S2 E2 S3 E3
Value
b a
Stream Position
c
1) Maintain bounds [a,b] for median and c in [a,b] 2) Split stream in segments: S1, E1, S2, E2, ... , Sp, Ep 3) For i ∈[p]:
Sample c ∈ Si ∩[a,b] Estimate rank(c) from Ei Update [a,b]
S1 E1 S2 E2 S3 E3
rank(c) is ±t w.h.p.
rank(c) is ±t w.h.p.
then there exists c in Si ∩ [a,b] w.h.p.
rank(c) is ±t w.h.p.
then there exists c in Si ∩ [a,b] w.h.p.
hence p = O(lg m) w.h.p.
rank(c) is ±t w.h.p.
then there exists c in Si ∩ [a,b] w.h.p.
hence p = O(lg m) w.h.p.
element with rank m/2±t using O(log m) space.
length m binary string x
index i in range [m]
INDEX: “What’s the value of xi?”
Requires Ω(n) bits transmitted.
length m binary string x
index i in range [m]
INDEX: “What’s the value of xi?”
Requires Ω(n) bits transmitted.
length m binary string x
index i in range [m]
median with prob. at least 3/4 using S space.
INDEX: “What’s the value of xi?”
Requires Ω(n) bits transmitted.
length m binary string x
index i in range [m]
median with prob. at least 3/4 using S space.
... 2i+xi ... 2m+xm 2+x1
INDEX: “What’s the value of xi?”
Requires Ω(n) bits transmitted.
length m binary string x
index i in range [m]
median with prob. at least 3/4 using S space.
... 2i+xi ... 2m+xm 2+x1 0 ... 2m+2 ... 2m+2
m-j j-1
INDEX: “What’s the value of xi?”
Requires Ω(n) bits transmitted.
length m binary string x
index i in range [m]
median with prob. at least 3/4 using S space.
MEMORY STATE OF ALGORITHM
... 2i+xi ... 2m+xm 2+x1 0 ... 2m+2 ... 2m+2
m-j j-1
INDEX: “What’s the value of xi?”
Requires Ω(n) bits transmitted.
length m binary string x
index i in range [m]
median with prob. at least 3/4 using S space.
[Henzinger, Raghavan, Rajagopalan ’99]
MEMORY STATE OF ALGORITHM
... 2i+xi ... 2m+xm 2+x1 0 ... 2m+2 ... 2m+2
m-j j-1
length m1 binary string x
index i in range [m1]
length m1 binary string x
index i in range [m1]
Alice: picks b randomly from [m1] and inserts a random permutation of,
{ 0, ... , 0
, 2 + x1, ... , 2m1 + xm1, 2m + 2, ... , 2m + 2
}
length m1 binary string x
index i in range [m1]
MEMORY STATE OF ALGORITHM and “b”
Alice: picks b randomly from [m1] and inserts a random permutation of,
{ 0, ... , 0
, 2 + x1, ... , 2m1 + xm1, 2m + 2, ... , 2m + 2
}
length m1 binary string x
index i in range [m1]
MEMORY STATE OF ALGORITHM and “b”
Bob: inserts a random permutation of,
{0, ... , 0
, 2m + 2, ... , 2m + 2
}
Alice: picks b randomly from [m1] and inserts a random permutation of,
{ 0, ... , 0
, 2 + x1, ... , 2m1 + xm1, 2m + 2, ... , 2m + 2
}
length m1 binary string x
index i in range [m1]
MEMORY STATE OF ALGORITHM and “b”
Bob: inserts a random permutation of,
{0, ... , 0
, 2m + 2, ... , 2m + 2
}
Alice: picks b randomly from [m1] and inserts a random permutation of,
{ 0, ... , 0
, 2 + x1, ... , 2m1 + xm1, 2m + 2, ... , 2m + 2
}
length m1 binary string x
index i in range [m1]
random-ordering, succeeds w/p 2/3.
MEMORY STATE OF ALGORITHM and “b”
Bob: inserts a random permutation of,
{0, ... , 0
, 2m + 2, ... , 2m + 2
}
Alice: picks b randomly from [m1] and inserts a random permutation of,
{ 0, ... , 0
, 2 + x1, ... , 2m1 + xm1, 2m + 2, ... , 2m + 2
}
length m1 binary string x
index i in range [m1]
random-ordering, succeeds w/p 2/3.
MEMORY STATE OF ALGORITHM and “b”
Bob: inserts a random permutation of,
{0, ... , 0
, 2m + 2, ... , 2m + 2
}
Alice: picks b randomly from [m1] and inserts a random permutation of,
{ 0, ... , 0
, 2 + x1, ... , 2m1 + xm1, 2m + 2, ... , 2m + 2
}
length m1 binary string x
index i in range [m1]
random-ordering, succeeds w/p 2/3.
MEMORY STATE OF ALGORITHM and “b”
Bob: inserts a random permutation of,
{0, ... , 0
, 2m + 2, ... , 2m + 2
}
Alice: picks b randomly from [m1] and inserts a random permutation of,
{ 0, ... , 0
, 2 + x1, ... , 2m1 + xm1, 2m + 2, ... , 2m + 2
}
function fA: [t]→[t]
function fB: [t]→[t]
Compute fA(fB( ... (fA(fB(fA(1)))) ... ))
function fA: [t]→[t]
function fB: [t]→[t] k
Compute fA(fB( ... (fA(fB(fA(1)))) ... ))
function fA: [t]→[t]
function fB: [t]→[t] k
If Bob speaks first: k messages: O(k log t) bits. k-1 messages: Ω(t) bits.
[Nisan, Wigderson ’93]
function fZ: [t]→[t]
function fB: [t]→[t]
function fC: [t]→[t]
Compute fZ( ... fC(fB(fA(1))) ... )
Order of speaking “Z, ..., C, B, A” k rounds: O(k log t) bits. k-1 rounds: Ω(t) bits.
function fA: [t]→[t]
Bits communicated
Bits communicated
Bits communicated
Bits communicated
Bits communicated
z
Bits communicated
z
fA: [t]→[t] fB: [t]→[t] fC: [t]→[t]
fA: [t]→[t] fB: [t]→[t] fC: [t]→[t]
fA: [t]→[t] fB: [t]→[t] fC: [t]→[t]
Median=fA(1) fB(fA(1) fC(fB(fA(1)))
1 1 fC(1) 1 2 fC(2) 1 3 fC(3) 2 1 fC(1) 2 2 fC(2) 2 3 fC(3) 3 1 fC(1) 3 2 fC(2) 3 3 fC(3) 1 0 0 x (3-fB(1)) 1 4 0 x (fB(1)-1) 2 0 0 x (3-fB(2)) 2 4 0 x (fB(2)-1) 3 0 0 x (3-fB(3)) 3 4 0 x (fB(3)-1) 0 0 0 x (3-fA(1)) x 5 4 0 0 x (fA(1)-1) x 5
VALUE
generalizing to approximation gives...
generalizing to approximation gives...
passes requires Ω(m(1-δ)/p p-6) space.
a) 1-pass, O(polylog m)-space, Õ(m1/2)-approx b) O(lg lg m)-pass, O(polylog m)-space exact selection
a) 1-pass, O(polylog m)-space, Õ(m1/2)-approx b) O(lg lg m)-pass, O(polylog m)-space exact selection
a) 1-pass, Õ(m1/2)-approx requires Ω(m1/2) space b) O(polylog m)-space exact requires Ω(lg m) passes
a) 1-pass, O(polylog m)-space, Õ(m1/2)-approx b) O(lg lg m)-pass, O(polylog m)-space exact selection
a) 1-pass, Õ(m1/2)-approx requires Ω(m1/2) space b) O(polylog m)-space exact requires Ω(lg m) passes
a) 1-pass, O(polylog m)-space, Õ(m1/2)-approx b) O(lg lg m)-pass, O(polylog m)-space exact selection
a) 1-pass, Õ(m1/2)-approx requires Ω(m1/2) space b) O(polylog m)-space exact requires Ω(lg m) passes
world of random-order streaming...