Query Processing University of Wisconsin Madison Specializing in - PowerPoint PPT Presentation

✆ ☛☞ ✁ ✑ ✪ ☎ ✫ ✔ ☎ ✡ ✑ ✑ ✬ ✆ ☛ ✖✭ ✑ ✑ ✮ ✌ ✢ � ✣ ✡ ✆ ✁ ✂ ✄☎ ✳ ✝ ✛ ✑ ✞ ✟ ✤ ✠ ☎ ✝ ✑ ✞✩ ✛ ✪ ✣ ✤ ✕ ✆ ✡ ✄ ✎ ✧ ✓ ✖ ✲ ✌✭ ✫ ✌ ✣ ✆ ✤ ✤ ✣ ✑ ✳ ✳ ✳ ✳ ✳ ✂ ✝ ✢ ☎ ☎ ✠✯ ✌✰ ✖ ✄ ✂ ✧ ✁ ✂ ✡ ✄ ✍ ✖ ✄ ☛ ✱ ✡ ✆ ✲ ✌ ☛ ✂ ☎ ✔ ✍ ✍ ✄ ✌✎ ✓ ☎ ✟ ☛☞ ✕ ✂ ✖ ✡ ✑ ✁✗ ✘✙ ✌ ✡ ✜ ✂ � � � � ✁ ✂ ✄☎ ✆ ☎ ✝ ✞ ✆ ✟ ✠ ☎ ✝ ✚✛ ✡ ✖ ✌ ✦ ☎ ☞ ✌ ✍ ✡ ✄ ✎ ✖ ✏ ✑ ✓ ☎ ✔✟ ✕ ✂ ✖ ✖ ✍ ✌✥ ✤ ✌ ✄ ✍ ✡✢ ✌ ✣✤ ✌ ☞ ✣ � ☎ Opening Game - Who am I? Professor at the Query Processing University of Wisconsin – Madison Specializing in database performance issues (i.e. joins) Jon Frankel, Noi Jencharat, Ened Ketri, Bonus : What stream Anurag Maskey, Andy See, Larissa Smelkov system have I worked on? 3/25/03 Query Processing – Papers Query Processing – Today ’ s Agenda �� ✏✒✑ 1:40 Motivation & Setup Examples �� 2:20 Rate Based Query Paper ✡✟★✧ 2:50 Break �� 3:00 Window Joins Paper �!��"�#�� $��%��#��&�� 3:30 K-Constraints Paper ��'�� 4:00 Discussion 1

� � � � � � � � 127 Flashback Stream Challenges SQL Query Final Answer? Block Reads? Optimizer Parser/ Optimizer - cost based Cardinality? Translator Rewriter Select * from students Plan where major = ‘ cosi ’ Generator and birthday = ‘ 0325 ’ Evaluator Final Data Rate Based Analysis Stream Challenges MFC – 10/day; JDF 3/day Day New MFC Left JDF Left Final Answer? Rate Based 1 6 6 -> 3 0 3 -> 1 0 Block Reads? Cost 2 9 9 -> 4 0 3 -> 1 1 Cardinality? Estimating 3 12 10 -> 5 2 3 -> 1 3 4 0 2 -> 1 0 3 -> 1 1 5 •Load Shedding 1 -> 0 0 6 •Ad Hoc Queries 7 •Persistent Queries 8 9 2

✳ ✳ Rate Based Analysis Rate Based Analysis MFC – 10/day; JDF 3/day MFC – 10/day; JDF 3/day Day New MFC Left JDF Left JDF Left MFC Left Day New MFC Left JDF Left JDF Left MFC Left 1 1 6 6 -> 3 0 3 -> 1 0 3 -> 1 3 6 6 -> 3 0 3 -> 1 0 3 -> 1 3 2 2 9 9 -> 4 0 3 -> 1 1 3 -> 1 9 9 9 -> 4 0 3 -> 1 1 3 -> 1 9 3 3 12 10 -> 5 2 3 -> 1 3 3 -> 1 18 12 10 -> 5 2 3 -> 1 3 3 -> 1 18 4 4 0 2 -> 1 0 3 -> 1 1 3 -> 1 15 0 2 -> 1 0 3 -> 1 1 3 -> 1 15 5 5 1 -> 0 0 3 -> 1 12 1 -> 0 0 3 -> 1 12 6 6 3 -> 1 9 3 -> 1 9 7 7 3 -> 1 6 3 -> 1 6 8 8 3 -> 1 3 3 -> 1 3 9 9 3 -> 1 0 3 -> 1 0 9 -> 4 0 127 Flashback – Joins Cost Optimization - Speed?? Predicate Pushdown Select * from students as s, courses as c where s.major = ‘ cosi ’ Coming Up…. and c.dept = ‘ cosi ’ and s.sid = c.sid -Different ways to measure rates - SPJ applicability 3

� � � � � � � � Stream Challenges II Stream Challenges II Blocking Query Operators A A Blocking Query Operators A A (option: pipelined join) B1 C1 (option: pipelined join) B1 Lost/Delayed/Unordered Lost/Delayed/Unordered C B C1 Data Data D D1 C B And yet, benefits are And yet, benefits are F D2 D1 huge … huge … E E D D2 B2 C2 F E E B2 C2 Stock Market – Econ 2A Data is Out there! Stock prices are based on ? (http://biz.yahoo.com/cc/) Thu Mar 20 Times are U.S. Eastern 8:30 am CYCL Centennial Communications Earnings (Q3 2003) 8:30 am DV DeVry Inc. Acquires Ross University 8:30 am ENTG Entegris, Inc. Earnings (Q2 2003) 8:30 am PLXS Plexus Announcement 9:00 am HOLL Hollywood Media Corp. Fourth Quarter and Year-End 2002 9:00 am LEH Lehman Brothers Holdings First Quarter 2003 Earnings 10:00 am CSCO Cisco Systems Announces Agreement to Acquire The Linksys Group, Inc. 10:00 am FNLY Finlay Enterprises, Inc. Earnings (Q4 2002) 10:00 am GIII G-III Apparel Group Earnings (Q4 2003) 10:00 am GLYN Galyan's Trading Company, Inc. Fourth Quarter 2002 10:00 am MWD Morgan Stanley Earnings (Q1 2003) Papa 10:00 am TRMS Trimeris, Inc. Earnings (Q4 2002) S&P Federated Home 10:30 am GPN Global Payments Inc. Earnings (Q3 2003) John’s Dept Stores Depot 11:00 am GDT Biosensor`s Agreement/Drug Eluting Stent Update 11:00 am CRAI Charles River Associates Earnings (Q1 2003) 11:00 am CHKR Checkers Drive-In Restaurants Earnings (Q4 2002) 11:00 am CPWM Cost Plus Earnings (Q4 2002) 11:00 am JCREW J. Crew Group, Inc. Earnings (Q4 2003) 4

� � Query: Short-term Downward Momentum: Stocks & Stream Systems Find all NASDAQ stocks between $20 and $200 that have moved down more than 2% in the last 20 minutes and there has been significant buying pressure Tickers (70% or more of the volume has traded EPS (est) EPS (actual) toward the ask price) in the last 2 minutes. 1. IBM 80.00 1. HD 0.27 1. INTC 15.00 1. HD 22.00 Or by: 2. IBM 80.50 2. HD 0.30 Earnings, News, 2. INTC 22.25 Industry 5. HD 22.75 9. HD 23.00 Aurora Example Join Challenges – Window Options Soldiers Tanks Problem? Aurora Option (by A A (time, ID, pos) (time, ID, pos) 1, S1, A 1, T1, C individual tuple) B1 C1 A S1, S2 1, S2, A Stream Option (slide) C B B T2, S3 C T1 2, S1, A 2, T1, C D D1 A S1 1, S3, B 1, T2, B Tuple vs Timestamp F D2 2, S3, B 2, T2, C B S3 C T1, T2, S2 E E 3, S1, A Order! A T1, S1 3, S3, B 3, T1, A B2 C2 2, S2, C 3, T2, C B S3 C T2, S2 3, S2, B 5

� � � � � � Join Challenges – Window Options Join Challenges – Window Options Aurora Option (by A A Aurora Option (by A A individual tuple) individual tuple) B1 C1 B1 C1 Stream Option (slide) Stream Option (slide) C B C B D D1 D D1 Tuple vs Timestamp Tuple vs Timestamp F D2 F D2 E E E E Order! Order! B2 C2 B2 C2 Join Challenges – Window Options Accuracy – How to Aurora Option (by A A window? individual tuple) B1 C1 Stream Option (slide) C B Coming Up…. D D1 Tuple vs Timestamp F D2 -Joining algorithms E E Order! B2 C2 -Lots of cool graphs 6

✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ Motivations What is Rate? Traditional Optimizers requires cardinality Number of records per a unit of time. of the input…. Output Rate = # output transmitted In streams, cardinality is not known and time needed for transmission inputs come at different rate… Output Rate = #papers RATE-BASED optimization processing time needed Output Rate Estimation Output Rate for Projections case 1: Mitch For Projections Time to read papers is shorter than time For Selections between getting the papers paper 2 paper 3 paper 1 For Joins 1 hour 1/2 hour 1 hour time paper 1 paper 2 paper 3 So the output rate = the input rate 7

✳ ✳ ✳ ✳ ✳ Output Rate for Projections Output Rate for Projections case 2: Jon In general, time to do projection is low. Time to read papers is longer than time So between getting the papers paper 1 1.5 hour paper 2 Output Rate = Input Rate 1.5 hour r o r i 1 hour time paper 1 paper 2 paper 3 So the output rate = 1/(time to do projection) Output Rate for Selections Output Rate for Selection case 1: Mitch Selectivity (f) = percentage of papers that takes 1/2 hour to read 1 paper, with will be selected selectivity = 0.5 2 hours paper 1 1/2 hour 1 hour time paper 1 paper 2 paper 3 output rate = 1/2 paper/hour So the output rate = f * the input rate 8

✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ Output Rate for Selection Output Rate for Selections case 2: John In general, time to perform selection is less than interval between inputs. takes 1.5 hour to read 1 paper, with selectivity = 0.5 So 3 hours 1.5 hour Output Rate = Selectivity * Input Rate 1 hour r o f * r i time paper 1 paper 2 paper 3 output rate = 1/3 paper/hour (= 1/2 * 1/1.5) So the output rate = f * (1/time to select) Output Rate for Joins Recall Total #’s of papers What are the papers by same author Mitch in output and Jon gives the same grading to? Output Rate = r M = No. of papers Mitch reads per hour # output transmitted r J = No. of papers Jon reads per hour time needed for transmission f = Selectivity of join Total time to do the Join C M = Time to handle reviews from Mitch C J = Time to handle reviews from Jon 9

Query Processing University of Wisconsin Madison Specializing in - PowerPoint PPT Presentation

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

Online Query Processing Exposure to online query processing algorithms and fundamentals A

V.3 Query Processing 1. Term-at-a-Time 2. Document-at-a-Time 3. WAND 4. Quit & Continue 5.

Efficient query processing Efficient scoring, distributed query processing Web Search 1 Ranking

Query Processing Query Processing Steps balance < 2500 ( balance ( account)) balance

A Generic Mapping-based Query Translation A Generic Mapping-based Query Translation from SPARQL

Information Retrieval > Query Us User er Query Words Query Words Search Personalization

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query

A DELICATE BALANCE: Resource allocation in cardiovascular care Eric A. Cohen MD, FRCPC

Invasive Imaging Improves Prognosis and Saves Lives The Evidence Davide Capodanno, MD, PhD

Current State Over Volume Silos of Utilization over Care Value Fee for Service 2

Total Cost of Care (TCOC) Workgroup July 25, 2018 Agenda Introductions Updates on

Perspectives from 17 years of comprehensive medication management Who is Fairview Health Services?

Impact of C. difficile in Canada What is Clostridium difficile ? Spore-forming bacteria

Micafungin: A Review in the Prophylaxis and Treatment of Invasive Candida Infections in Paediatric

Welcome to Learning Session 1 August 2018 Mihi Whakatau Te Aroha Te aroha Te whakapono Me te

Query Processing University of Wisconsin Madison Specializing in - PowerPoint PPT Presentation

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

Online Query Processing Exposure to online query processing algorithms and fundamentals A

V.3 Query Processing 1. Term-at-a-Time 2. Document-at-a-Time 3. WAND 4. Quit &amp; Continue 5.

Efficient query processing Efficient scoring, distributed query processing Web Search 1 Ranking

Query Processing Query Processing Steps balance &lt; 2500 ( balance ( account)) balance

A Generic Mapping-based Query Translation A Generic Mapping-based Query Translation from SPARQL

Information Retrieval &gt; Query Us User er Query Words Query Words Search Personalization

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query

A DELICATE BALANCE: Resource allocation in cardiovascular care Eric A. Cohen MD, FRCPC

Invasive Imaging Improves Prognosis and Saves Lives The Evidence Davide Capodanno, MD, PhD

Current State Over Volume Silos of Utilization over Care Value Fee for Service 2

Total Cost of Care (TCOC) Workgroup July 25, 2018 Agenda Introductions Updates on

Perspectives from 17 years of comprehensive medication management Who is Fairview Health Services?

Impact of C. difficile in Canada What is Clostridium difficile ? Spore-forming bacteria

Micafungin: A Review in the Prophylaxis and Treatment of Invasive Candida Infections in Paediatric

Welcome to Learning Session 1 August 2018 Mihi Whakatau Te Aroha Te aroha Te whakapono Me te

V.3 Query Processing 1. Term-at-a-Time 2. Document-at-a-Time 3. WAND 4. Quit & Continue 5.

Query Processing Query Processing Steps balance < 2500 ( balance ( account)) balance

Information Retrieval > Query Us User er Query Words Query Words Search Personalization