query processing
play

Query Processing University of Wisconsin Madison Specializing in - PowerPoint PPT Presentation


  1. ✆ ☛☞ ✁ ✑ ✪ ☎ ✫ ✔ ☎ ✡ ✑ ✑ ✬ ✆ ☛ ✖✭ ✑ ✑ ✮ ✌ ✢ � ✣ ✡ ✆ ✁ ✂ ✄☎ ✳ ✝ ✛ ✑ ✞ ✟ ✤ ✠ ☎ ✝ ✑ ✞✩ ✛ ✪ ✣ ✤ ✕ ✆ ✡ ✄ ✎ ✧ ✓ ✖ ✲ ✌✭ ✫ ✌ ✣ ✆ ✤ ✤ ✣ ✑ ✳ ✳ ✳ ✳ ✳ ✂ ✝ ✢ ☎ ☎ ✠✯ ✌✰ ✖ ✄ ✂ ✧ ✁ ✂ ✡ ✄ ✍ ✖ ✄ ☛ ✱ ✡ ✆ ✲ ✌ ☛ ✂ ☎ ✔ ✍ ✍ ✄ ✌✎ ✓ ☎ ✟ ☛☞ ✕ ✂ ✖ ✡ ✑ ✁✗ ✘✙ ✌ ✡ ✜ ✂ � � � � ✁ ✂ ✄☎ ✆ ☎ ✝ ✞ ✆ ✟ ✠ ☎ ✝ ✚✛ ✡ ✖ ✌ ✦ ☎ ☞ ✌ ✍ ✡ ✄ ✎ ✖ ✏ ✑ ✓ ☎ ✔✟ ✕ ✂ ✖ ✖ ✍ ✌✥ ✤ ✌ ✄ ✍ ✡✢ ✌ ✣✤ ✌ ☞ ✣ � ☎ Opening Game - Who am I? Professor at the Query Processing University of Wisconsin – Madison Specializing in database performance issues (i.e. joins) Jon Frankel, Noi Jencharat, Ened Ketri, Bonus : What stream Anurag Maskey, Andy See, Larissa Smelkov system have I worked on? 3/25/03 Query Processing – Papers Query Processing – Today ’ s Agenda ����������� ✏✒✑ 1:40 Motivation & Setup Examples ��������������������������������������������� �������� 2:20 Rate Based Query Paper ✡✟★✧ 2:50 Break ��������������������������������������� �������� 3:00 Window Joins Paper �!���������"�#�������������� �������$����������%�������#����������&������� 3:30 K-Constraints Paper �����'���� ������ 4:00 Discussion 1

  2. � � � � � � � � 127 Flashback Stream Challenges SQL Query Final Answer? Block Reads? Optimizer Parser/ Optimizer - cost based Cardinality? Translator Rewriter Select * from students Plan where major = ‘ cosi ’ Generator and birthday = ‘ 0325 ’ Evaluator Final Data Rate Based Analysis Stream Challenges MFC – 10/day; JDF 3/day Day New MFC Left JDF Left Final Answer? Rate Based 1 6 6 -> 3 0 3 -> 1 0 Block Reads? Cost 2 9 9 -> 4 0 3 -> 1 1 Cardinality? Estimating 3 12 10 -> 5 2 3 -> 1 3 4 0 2 -> 1 0 3 -> 1 1 5 •Load Shedding 1 -> 0 0 6 •Ad Hoc Queries 7 •Persistent Queries 8 9 2

  3. ✳ ✳ Rate Based Analysis Rate Based Analysis MFC – 10/day; JDF 3/day MFC – 10/day; JDF 3/day Day New MFC Left JDF Left JDF Left MFC Left Day New MFC Left JDF Left JDF Left MFC Left 1 1 6 6 -> 3 0 3 -> 1 0 3 -> 1 3 6 6 -> 3 0 3 -> 1 0 3 -> 1 3 2 2 9 9 -> 4 0 3 -> 1 1 3 -> 1 9 9 9 -> 4 0 3 -> 1 1 3 -> 1 9 3 3 12 10 -> 5 2 3 -> 1 3 3 -> 1 18 12 10 -> 5 2 3 -> 1 3 3 -> 1 18 4 4 0 2 -> 1 0 3 -> 1 1 3 -> 1 15 0 2 -> 1 0 3 -> 1 1 3 -> 1 15 5 5 1 -> 0 0 3 -> 1 12 1 -> 0 0 3 -> 1 12 6 6 3 -> 1 9 3 -> 1 9 7 7 3 -> 1 6 3 -> 1 6 8 8 3 -> 1 3 3 -> 1 3 9 9 3 -> 1 0 3 -> 1 0 9 -> 4 0 127 Flashback – Joins Cost Optimization - Speed?? Predicate Pushdown Select * from students as s, courses as c where s.major = ‘ cosi ’ Coming Up…. and c.dept = ‘ cosi ’ and s.sid = c.sid -Different ways to measure rates - SPJ applicability 3

  4. � � � � � � � � Stream Challenges II Stream Challenges II Blocking Query Operators A A Blocking Query Operators A A (option: pipelined join) B1 C1 (option: pipelined join) B1 Lost/Delayed/Unordered Lost/Delayed/Unordered C B C1 Data Data D D1 C B And yet, benefits are And yet, benefits are F D2 D1 huge … huge … E E D D2 B2 C2 F E E B2 C2 Stock Market – Econ 2A Data is Out there! Stock prices are based on ? (http://biz.yahoo.com/cc/) Thu Mar 20 Times are U.S. Eastern 8:30 am CYCL Centennial Communications Earnings (Q3 2003) 8:30 am DV DeVry Inc. Acquires Ross University 8:30 am ENTG Entegris, Inc. Earnings (Q2 2003) 8:30 am PLXS Plexus Announcement 9:00 am HOLL Hollywood Media Corp. Fourth Quarter and Year-End 2002 9:00 am LEH Lehman Brothers Holdings First Quarter 2003 Earnings 10:00 am CSCO Cisco Systems Announces Agreement to Acquire The Linksys Group, Inc. 10:00 am FNLY Finlay Enterprises, Inc. Earnings (Q4 2002) 10:00 am GIII G-III Apparel Group Earnings (Q4 2003) 10:00 am GLYN Galyan's Trading Company, Inc. Fourth Quarter 2002 10:00 am MWD Morgan Stanley Earnings (Q1 2003) Papa 10:00 am TRMS Trimeris, Inc. Earnings (Q4 2002) S&P Federated Home 10:30 am GPN Global Payments Inc. Earnings (Q3 2003) John’s Dept Stores Depot 11:00 am GDT Biosensor`s Agreement/Drug Eluting Stent Update 11:00 am CRAI Charles River Associates Earnings (Q1 2003) 11:00 am CHKR Checkers Drive-In Restaurants Earnings (Q4 2002) 11:00 am CPWM Cost Plus Earnings (Q4 2002) 11:00 am JCREW J. Crew Group, Inc. Earnings (Q4 2003) 4

  5. � � Query: Short-term Downward Momentum: Stocks & Stream Systems Find all NASDAQ stocks between $20 and $200 that have moved down more than 2% in the last 20 minutes and there has been significant buying pressure Tickers (70% or more of the volume has traded EPS (est) EPS (actual) toward the ask price) in the last 2 minutes. 1. IBM 80.00 1. HD 0.27 1. INTC 15.00 1. HD 22.00 Or by: 2. IBM 80.50 2. HD 0.30 Earnings, News, 2. INTC 22.25 Industry 5. HD 22.75 9. HD 23.00 Aurora Example Join Challenges – Window Options Soldiers Tanks Problem? Aurora Option (by A A (time, ID, pos) (time, ID, pos) 1, S1, A 1, T1, C individual tuple) B1 C1 A S1, S2 1, S2, A Stream Option (slide) C B B T2, S3 C T1 2, S1, A 2, T1, C D D1 A S1 1, S3, B 1, T2, B Tuple vs Timestamp F D2 2, S3, B 2, T2, C B S3 C T1, T2, S2 E E 3, S1, A Order! A T1, S1 3, S3, B 3, T1, A B2 C2 2, S2, C 3, T2, C B S3 C T2, S2 3, S2, B 5

  6. � � � � � � Join Challenges – Window Options Join Challenges – Window Options Aurora Option (by A A Aurora Option (by A A individual tuple) individual tuple) B1 C1 B1 C1 Stream Option (slide) Stream Option (slide) C B C B D D1 D D1 Tuple vs Timestamp Tuple vs Timestamp F D2 F D2 E E E E Order! Order! B2 C2 B2 C2 Join Challenges – Window Options Accuracy – How to Aurora Option (by A A window? individual tuple) B1 C1 Stream Option (slide) C B Coming Up…. D D1 Tuple vs Timestamp F D2 -Joining algorithms E E Order! B2 C2 -Lots of cool graphs 6

  7. ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ Motivations What is Rate? Traditional Optimizers requires cardinality Number of records per a unit of time. of the input…. Output Rate = # output transmitted In streams, cardinality is not known and time needed for transmission inputs come at different rate… Output Rate = #papers RATE-BASED optimization processing time needed Output Rate Estimation Output Rate for Projections case 1: Mitch For Projections Time to read papers is shorter than time For Selections between getting the papers paper 2 paper 3 paper 1 For Joins 1 hour 1/2 hour 1 hour time paper 1 paper 2 paper 3 So the output rate = the input rate 7

  8. ✳ ✳ ✳ ✳ ✳ Output Rate for Projections Output Rate for Projections case 2: Jon In general, time to do projection is low. Time to read papers is longer than time So between getting the papers paper 1 1.5 hour paper 2 Output Rate = Input Rate 1.5 hour r o r i 1 hour time paper 1 paper 2 paper 3 So the output rate = 1/(time to do projection) Output Rate for Selections Output Rate for Selection case 1: Mitch Selectivity (f) = percentage of papers that takes 1/2 hour to read 1 paper, with will be selected selectivity = 0.5 2 hours paper 1 1/2 hour 1 hour time paper 1 paper 2 paper 3 output rate = 1/2 paper/hour So the output rate = f * the input rate 8

  9. ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ Output Rate for Selection Output Rate for Selections case 2: John In general, time to perform selection is less than interval between inputs. takes 1.5 hour to read 1 paper, with selectivity = 0.5 So 3 hours 1.5 hour Output Rate = Selectivity * Input Rate 1 hour r o f * r i time paper 1 paper 2 paper 3 output rate = 1/3 paper/hour (= 1/2 * 1/1.5) So the output rate = f * (1/time to select) Output Rate for Joins Recall Total #’s of papers What are the papers by same author Mitch in output and Jon gives the same grading to? Output Rate = r M = No. of papers Mitch reads per hour # output transmitted r J = No. of papers Jon reads per hour time needed for transmission f = Selectivity of join Total time to do the Join C M = Time to handle reviews from Mitch C J = Time to handle reviews from Jon 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend