1
An Adaptive Query Execution Engine for Data Integration
Zachary Ives, Daniela Florescu, Marc Friedman, Alon Levy, Daniel S. Weld University of Washington
Presented by Peng Li@CS.UBC
An Adaptive Query Execution Engine for Data Integration Zachary - - PDF document
An Adaptive Query Execution Engine for Data Integration Zachary Ives, Daniela Florescu, Marc Friedman, Alon Levy, Daniel S. Weld University of Washington Presented by Peng Li@CS.UBC 1 Outline The Background of Data Integration Systems
Presented by Peng Li@CS.UBC
Data Integration System Multiple autonomous (can’t affect behavior of sources) heterogeneous (different models and schemas) data sources
the data sources;
sources ;
cost of accessing each source and so on
RAP1
Slide 9 RAP1
Rachel Pottinger, 2/20/2006
Why does the system need the fragment structure?
The optimizer’s cardinality estimate for the fragment’s result is significantly different from the actual size ->reinvoke optimizer
The execution engine checks properties of the result to select the next fragment
Reschedule if a source times out
RAP2
Slide 12 RAP2 Given time constraints, I'd cut slides 12 & 13
Rachel Pottinger, 2/20/2006
– Absence of statistics – Unpredictable data arrival characteristics – Overlap and redundancy among sources – Optimizing the time to initial answers
– To help discussion, more specific situations will be given – But you may assume any problem or situation
– Form 8 groups (3~4 person per group, two teams per topic) – Discuss Q1 and Q2 for one topic (5 ~ 7 minutes)
JoinOrders.TrackNo = UPS.TrackNo (Orders, UPS) OrderNo 1234 1235 1399 1500 TrackNo 01-23-45 02-90-85 02-90-85 03-99-10 Status In Transit Delivered Delivered Delivered OrderNo 1234 1235 1399 1500 TrackNo 01-23-45 02-90-85 02-90-85 03-99-10 TrackNo 01-23-45 02-90-85 03-99-10 04-08-30 Status In Transit Delivered Delivered Undeliverable
model
scheduling
SelectStatus = “Delivered” JoinOrders.TrackNo = UPS.TrackNo Read Orders Read UPS “Show which orders have been delivered”
When(closed(1)): if size_of(Orders) > 1000 then reoptimize {2, 3} SelectStatus = “Delivered” JoinOrders.TrackNo = UPS.TrackNo Read Orders Read UPS
(1) (2) (3)
We can find that Tukwila’s strategy of interleaving planning and execution can slash the total time spent processing a query. With a total speedup of 1.42 over pipeline and 1.69 over the naïve strategy of materializing .
RAP3
Slide 18 RAP3 Given time constraints, I'd cut this slide
Rachel Pottinger, 2/20/2006
A B C
Mirror_C
RAP4
Slide 20 RAP4 Again, considering time constraints, consider cutting slides 19 and 20
Rachel Pottinger, 2/20/2006
Tukwila A B
OrderNo 1234 1235 1399 …… TrackNo 01-23-45 02-90-85 02-90-85 …… TrackNo 01-23-45 02-90-85 03-99-10 …… Status In Transit Delivered Delivered ……
Hash Table (Orders) Hash Table (UPS) 01-23-45
JoinOrders.TrackNo = UPS.TrackNo (Orders, UPS)
matches as output
Join A B QA QB
RAP5
Slide 26 RAP5 Again, consider cutting due to time constraints.
Rachel Pottinger, 2/20/2006