mergeable summaries
play

Mergeable Summaries Q P Je ff M. Phillips P Q University of Utah - PowerPoint PPT Presentation

Mergeable Summaries Q P Je ff M. Phillips P Q University of Utah S ( Q, ) S ( P, ) joint with with Pankaj K. Agarwal (Duke) Graham Cormode (AT&T) Zengfeng Huang (HKUST) S ( P Q, ) Zheiwei Wei (HKUST) size of S ( X, )


  1. Mergeable Summaries Q P Je ff M. Phillips P ∪ Q University of Utah S ( Q, ε ) S ( P, ε ) joint with with Pankaj K. Agarwal (Duke) Graham Cormode (AT&T) Zengfeng Huang (HKUST) S ( P ∪ Q, ε ) Zheiwei Wei (HKUST) size of S ( X, ε ) is always m Ke Yi (HKUST) w Array: d CM[i,j]

  2. Summaries for MASSIVE Data Allows approximate computation with guarantees and small space coreset : small summary, proxy for full data set with approx guarantees: • ε -samples of ( P, R ) : approx density • ε -kernel: approx convex shape sketch : (random) (linear) combination of full data, recover functions with approx guarantees: • Euclidean distance: Johnson-Lindenstrauss random projection • min-count sketch: approx item counts • Greenwald-Khanna sketch: approx quantiles • Misra-Gries sketch: approx frequent items w Array: d CM[i,j]

  3. Summaries for MASSIVE Data Allows approximate computation with guarantees and small space coreset : small summary, proxy for full data set with approx guarantees: • ε -samples of ( P, R ) : approx density • ε -kernel: approx convex shape Summary sketch : (random) (linear) combination of full data, recover functions with approx guarantees: • Euclidean distance: Johnson-Lindenstrauss random projection • min-count sketch: approx item counts • Greenwald-Khanna sketch: approx quantiles • Misra-Gries sketch: approx frequent items

  4. Massive Distributed Computation data centers sensor networks multi-core

  5. Massive Distributed Computation data centers sensor networks multi-core

  6. Massive Distributed Computation data centers sensor networks multi-core

  7. Massive Distributed Computation data centers sensor networks multi-core

  8. Massive Distributed Computation data centers sensor networks multi-core

  9. Massive Distributed Computation data centers sensor networks multi-core

  10. Massive Distributed Computation data centers sensor networks multi-core Q P S ( Q, ε ) S ( P, ε )

  11. Massive Distributed Computation data centers sensor networks multi-core Q P P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε )

  12. Massive Distributed Computation data centers sensor networks multi-core Q P P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  13. Massive Distributed Computation data centers sensor networks multi-core Q P P ∪ Q S ( Q, ε ) S ( P, ε ) • similar to: MUD, Dremel more restrictive, “natural” S ( P ∪ Q, ε ) • generalizes streaming • archiving summaries size of S ( X, ε ) is always m

  14. Random Sample Q P P val 15 17 20 1 8 42 7 10 14 3 ran .99 .42 .53 .01 .02 .23 .82 .75 .61 .14 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  15. Random Sample Q P P val 15 17 20 1 8 42 7 10 14 3 ran .99 .42 .53 .01 .02 .23 .82 .75 .61 .14 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  16. Random Sample Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  17. Random Sample S ( P, ε ) Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  18. Random Sample S ( P, ε ) Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  19. Random Sample S ( P, ε ) Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( Q, ε ) S ( P, ε ) val 15 31 9 7 16 10 ran .99 .90 .85 .82 .80 .75 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  20. Random Sample S ( P, ε ) P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( P ∪ Q, ε ) val 15 31 9 7 16 10 ran .99 .90 .85 .82 .80 .75 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  21. Random Sample S ( P, ε ) P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( P ∪ Q, ε ) val 15 31 9 7 16 10 ran .99 .90 .85 .82 .80 .75 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m max element top k elements

  22. Linear Sketches Count-Min sketch of vector P [1 ...U ] : • Linear sketch as array size w × d • Use d hash functions h to map x to [1 ...w ] • Estimate P [ i ] = min j CM [ h j ( i ) , j ] Mergeable: CM ( P + Q ) = CM ( P ) + CM ( Q ) w Array: d CM[i,j]

  23. Linear Sketches Q P Count-Min sketch of vector P [1 ...U ] : • Linear sketch as array size w × d • Use d hash functions h to map x to [1 ...w ] P ∪ Q • Estimate P [ i ] = min j CM [ h j ( i ) , j ] S ( Q, ε ) S ( P, ε ) Mergeable: CM ( P + Q ) = CM ( P ) + CM ( Q ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m w Array: d CM[i,j]

  24. Linear Sketches Q P Count-Min sketch of vector P [1 ...U ] : • Linear sketch as array size w × d • Use d hash functions h to map x to [1 ...w ] P ∪ Q • Estimate P [ i ] = min j CM [ h j ( i ) , j ] S ( Q, ε ) S ( P, ε ) Mergeable: CM ( P + Q ) = CM ( P ) + CM ( Q ) S ( P, ε ) S ( Q, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m w Array: d CM[i,j]

  25. Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,1) (14,3)

  26. Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,1) (14,3)

  27. Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,2) (14,3)

  28. Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,2) (14,3)

  29. Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,4) (3,5) (11,1) (14,2)

  30. Heavy Hitters Summaries P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 | P [ i ] − MG [ i ] | ≤ ε = ˆ m/ ( k + 1) (1,4) S ( P, ε ) (3,5) (11,1) (14,2)

  31. Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,2) (1,3) S ( Q, ε ) S ( P, ε ) (3,2) (3,4) (5,1) (9,5) (11,1) (14,4) (14,2)

  32. Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, P ∪ Q or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m (1,6) (3,6) (5,2) (9,5) (11,1) (14,6)

  33. Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, P ∪ Q or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m (1,5) (3,5) (5,1) (9,4) (14,5)

  34. Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, P ∪ Q or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m (1,5) (3,5) S ( P ∪ Q, ε ) (5,1) (9,4) (14,5)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend