Staying�FIT: Staying�FIT: Efficient�Load�Shedding�Techniques�for Efficient�Load�Shedding�Techniques�for Distributed�Stream�Processing Distributed�Stream�Processing Nesime�Tatbul Uğur�Çetintemel Stan�Zdonik
Talk�Outline Talk�Outline � Problem�Introduction Problem�Introduction � � Approach�Overview Approach�Overview � � Advance�Planning�with�an�LP�Solver Advance�Planning�with�an�LP�Solver � � Advance�Planning�with�FIT Advance�Planning�with�FIT � � Performance�Results Performance�Results � � Related�Work Related�Work � � Conclusions�and�Future�Work Conclusions�and�Future�Work � Nesime Tatbul, ETH Zurich 2 VLDB 2007, Vienna
Distributed�Stream�Processing Distributed�Stream�Processing The�Aurora/Borealis�System The�Aurora/Borealis�System ���������������������� �������� ����������������������� ������ Nesime Tatbul, ETH Zurich 3 VLDB 2007, Vienna
Bursty�Workload Bursty�Workload � Data�can�arrive�fast,�in�unpredictable�bursts Data�can�arrive�fast,�in�unpredictable�bursts � � Example:�Network�traffic�data Example:�Network�traffic�data � Bursts�may�create�resource�bottlenecks:� Bursts�may�create�resource�bottlenecks:� Bursts�may�create�resource�bottlenecks:� Query�processing�slows�down Query�processing�slows�down Query�processing�slows�down and�results�get�delayed�! and�results�get�delayed�! ! and�results�get�delayed� Source:�Internet�Traffic�Archive,�http://ita.ee.lbl.gov/ Nesime Tatbul, ETH Zurich 4 VLDB 2007, Vienna
Models�and�Assumptions Models�and�Assumptions � We�focus�on� We�focus�on�CPU CPU as�the�limited�resource. as�the�limited�resource. � � Load�shedding�is�achieved�by�inserting� Load�shedding�is�achieved�by�inserting�probabilistic� probabilistic� � drop�operators into�query�plans. into�query�plans. drop�operators � Random�Drop�[VLDB Random�Drop�[VLDB’ ’03],�Window�Drop�[VLDB 03],�Window�Drop�[VLDB’ ’06] 06] � � Approximate�result�is�a� Approximate�result�is�a�subset subset of�the�original�result. of�the�original�result. � � The�goal�is�to�maximize�the� The�goal�is�to�maximize�the�total�weighted�query� total�weighted�query� � throughput�(e.g.,�[Ayad�et�al,�SIGMOD (e.g.,�[Ayad�et�al,�SIGMOD’ ’04,�Amini� 04,�Amini� throughput� et�al,�ICDCS’ ’06]). 06]). et�al,�ICDCS � Servers�are�arranged�in�a� Servers�are�arranged�in�a�tree tree= =like�topology like�topology. . � Nesime Tatbul, ETH Zurich 5 VLDB 2007, Vienna
Distributed�Load�Shedding Distributed�Load�Shedding Key�Observation:�Load�Dependency Key�Observation:�Load�Dependency ������ ������ 1 tuple/sec 1/4 tuple/sec Cost = 1 Cost = 3 Selectivity = 1.0 Selectivity = 1.0 1 tuple/sec 1/4 tuple/sec Cost = 2 Cost = 1 Selectivity = 1.0 Selectivity = 1.0 Server�nodes�must�coordinate Server�nodes�must�coordinate Server�nodes�must�coordinate in�their�load�shedding�decisions in�their�load�shedding�decisions in�their�load�shedding�decisions Plan Plan Rates�at�A Rates�at�A A.load A.load A.throughput A.throughput B.load B.load B.throughput B.throughput Plan Plan Rates�at�A Rates�at�A A.load A.load A.throughput A.throughput B.load B.load B.throughput B.throughput Plan Rates�at�A A.load A.throughput B.load B.throughput to�achieve�high-quality�results. 0 1,�1 3 1/3,�1/3 4/3 1/4,�1/4 to�achieve�high- -quality�results. quality�results. 0 to�achieve�high 1,�1 3 1/3,�1/3 4/3 1/4,�1/4 0 1,�1 3 1/3,�1/3 4/3 1/4,�1/4 0 1,�1 3 1/3,�1/3 4/3 1/4,�1/4 optimal for�A 1 1,�0 1 1,�0 3 1/3,�0 1 1 1,�0 1,�0 1 1 1,�0 1,�0 3 3 1/3,�0 1/3,�0 optimal 2 0,�1/2 1 0,�1/2 1/2 0,�1/2 2 0,�1/2 1 0,�1/2 1/2 0,�1/2 feasible for�both for�both 3 1/5,�2/5 1 1/5,�2/5 1 1/5,�2/5 maximize�! ≤ 1 ≤ 1 Nesime Tatbul, ETH Zurich 6 VLDB 2007, Vienna
Distributed�Load�Shed ding Distributed�Load�Shed ding as�a�Linear�Optimization�Problem as�a�Linear�Optimization�Problem ������ ������ ������ ζ ζ ζ 1 2 N x 1 2 N r 1 s 1 s 1 s 1 c 1,1 c 2,1 c N,1 p 1 s 1,1 s 2,1 s N,1 2 N r D s D s D s D c 1,D c 2,D c N,D p D s 1,D s 2,D s N,D x D Problem�formulation�for�non-linear�query�plans Find such that for all nodes < ≤ x 0 i N : Problem�formulation�for�non non- -linear�query�plans linear�query�plans Problem�formulation�for� j (i.e.,�with�operator�splits�and�merges)�is�in�the�paper. (i.e.,�with�operator�splits�and�merges)�is�in�the�paper. (i.e.,�with�operator�splits�and�merges)�is�in�the�paper. D ∑ i × × × ≤ ζ r x s c , j j i j i j = j 1 ≤ ≤ 0 1 x j D ∑ × × × is maximized. r x s p j j j j = 1 j Nesime Tatbul, ETH Zurich 7 VLDB 2007, Vienna
Talk�Outline Talk�Outline � Problem�Introduction Problem�Introduction � � Approach�Overview Approach�Overview � � Advance�Planning�with�an�LP�Solver Advance�Planning�with�an�LP�Solver � � Advance�Planning�with�FIT Advance�Planning�with�FIT � � Performance�Results Performance�Results � � Related�Work Related�Work � � Conclusions�and�Future�Work Conclusions�and�Future�Work � Nesime Tatbul, ETH Zurich 8 VLDB 2007, Vienna
Architectural�Overview Architectural�Overview Centralized�vs.�Distributed Centralized�vs.�Distributed �������������������� �������������������� �������������������� ����������� ����������� ����������� Advance�Planning Advance�Planning Advance�Planning All All Coordinator Load�Monitoring Load�Monitoring Load�Monitoring All All Coordinator Plan�Selection Plan�Selection Plan�Selection All All Coordinator Plan�Implementation Plan�Implementation Plan�Implementation All All All Nesime Tatbul, ETH Zurich 9 VLDB 2007, Vienna
Architectural�Overview Architectural�Overview Centralized�Approach Centralized�Approach local plan local plan local plan local plan Statistics s c Plan-id Plan-id i local plan t s local plan i t a t S S t Statistics a Plan-id t i global plan Plan-id s t i c s s c Statistics i Plan-id t s P i l t a a n t - S i d ����������� Nesime Tatbul, ETH Zurich 10 VLDB 2007, Vienna
Architectural�Overview Architectural�Overview Distributed�Approach Distributed�Approach FIT � ! FIT � ! FIT FIT � ! FIT � ! FIT � ! FIT � ! Feasible�Input�Table�:�(r 1 ,�..,�r n ,�[local�plan],�quality) Feasible�Input�Table�:�(r 1 ,�..,�r n ,�[local�plan],�quality) Feasible�Input�Table�:�(r 1 ,�..,�r n ,�[local�plan],�quality) Nesime Tatbul, ETH Zurich 11 VLDB 2007, Vienna
Talk�Outline Talk�Outline � Problem�Introduction Problem�Introduction � � Approach�Overview Approach�Overview � � Advance�Planning�with�an�LP�Solver Advance�Planning�with�an�LP�Solver � � Advance�Planning�with�FIT Advance�Planning�with�FIT � � Performance�Results Performance�Results � � Related�Work Related�Work � � Conclusions�and�Future�Work Conclusions�and�Future�Work � Nesime Tatbul, ETH Zurich 12 VLDB 2007, Vienna
Recommend
More recommend