Massive scale-out of expensive continuous queries E. Zeitler and - PowerPoint PPT Presentation
Massive scale-out of expensive continuous queries E. Zeitler and T.Risch Presentation by Thomas Pasquier Stream splitting Splitstream splistream(stream s, int q, function bfn, function rfn) user defines rfn (routing function)
Massive scale-out of expensive continuous queries E. Zeitler and T.Risch Presentation by Thomas Pasquier
Stream splitting
Splitstream ● splistream(stream s, int q, function bfn, function rfn) ● user defines rfn (routing function) ● int rfn(int q, tupple t) ● user defines bfn (broadcast function) ● bool bfn(int q, tupple t)
Naive implementation
Tree shapped implemenation: maxtree Scalable Splitting of Massive Data Streams Erik Zeitler, Tore Risch
Parasplit
Parasplit*
Evaluation: network bound
Window router stream rate If w large enough bound by the network However, performance decrease when p large (author state reason unknown)
Evaluation parasplit* Less degradation when using parasplit*
Comparison different solutions
Cost model and heuristic
Cost model for Window router Cpr = cr + cs + ce ● cr : read cost ● cs : split cost ● ce : emit cost
Cost window splitter Cps = crw + cs (o+r+q.b) + ce(r+q.b) ● crw : read cost per window ● cs : split cost per tuple ● ce : emit cost per tuple ● o : omit % ● r : routing % ● b : broadcast % ● o + r + b = 100%
Cost model for query processor Cpq = cr + p(cp+cm) + O ● cr : read cost per tuple ● cp : poll cost ● cm : merge cost ● O : cost for executing the query and emitting the results
Cost model for parasplit ● Cpr = crw + cs + ce ● Cps = crw + cs (o+r+q.b) + ce(r+q.b) ● Cpq = cr + p(cp+cm) + O
Heuristic for estimating p ● We search p such that ● Assume: ○ 1% broadcast tuples ○ 0% omitted ○ crw = 0 ● Cps = crw + cs (o+r+q.b) + ce(r+q.b) ● ● We estimate cs + ce by measuring the maximum steam rate ● ● We can then estimate p, given the desired steam rate
Efficiency ● Measurement of the additional work incurred by executing parasplit in comparison to executing a window splitter in a single process ● Useful work: ○ p.Cps ● Overhead: ○ Cpr ○ q.Cpq with O=0
Evaluation efficiency
Related publications ● Event-based Systems: Opportunities and Challenges at Exascale, Brenna et al., 2009 ○ stream splitting shown to be a bottleneck ● MapReduce Online, Condie et al., 2010 ○ does not handle scalable stream splitting
Thank you Questions ?
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.