cs 744 scope
play

CS 744: SCOPE Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - PowerPoint PPT Presentation

Hello ! CS 744: SCOPE Shivaram Venkataraman Fall 2020 ADMINISTRIVIA Thursday - Assignment grades this week Single PDF file next - Midterm details on Piazza - Course Project Proposal Submission a convert I ppf photo Hot


  1. Hello ! CS 744: SCOPE Shivaram Venkataraman Fall 2020

  2. ↳ ADMINISTRIVIA Thursday - Assignment grades this week Single PDF file next - Midterm details on Piazza → → - Course Project Proposal Submission a convert I ppf photo Hot CRP ↳ Peer review ppf upload Anonymous → names include your don't in include them Only itself Hot CRP

  3. Applications ← I ✓ Pytoiphipep rear Machine Learning SQL Streaming Graph upper → MapReduce Computational Engines Ray spark Scalable Storage Systems Resource Management Datacenter Architecture

  4. SQL: STRUCTURED QUERY LANGUAGE I to language database a query

  5. ↳ Sou DATABASE SYSTEMS ' : . . ÷ : - OLAP m - . - O LTP - t Transaction processing Airline . reservation - -

  6. PROCEDURAL VS. RELATIONAL artie schema tendered data great :b : ! ! ) ^ \ lines = sc.textFile(“users") csv = lines.map(x => Esv ' " SELECT COUNT(*) ← Men - x.split(‘,’)) FROM “users” young = csv.filter(x => . . WHERE age < 21 x(1) < 21) !÷÷ .int an age ' - Ekin :& • println(young.count()) ÷ :* :c . " easy ftp.ograrre

  7. r → Microsoft SCOPE - Submit → SELECT query, COUNT(*) AS count ← FROM "search.log" to %¥ USING LogExtractor GROUP BY query HAVING count > 1000 hang ORDER BY count DESC; ÷ . Motl

  8. ↳ SCOPE OPERATORS x RDD ? powiat information ① asthma Input reading: What is different? A . text File EXTRACT column[:<type> ] [, ...] so # - filenames us - . FROM <input_stream(s) > ② pluggable USING <Extractor> [(args)] X or csr Extractor class [HAVING <predicate>] :p .com?l function " pwndoiirb geqrad-wgv.in?:::M:.ev::;.:ia & furring .

  9. SQL OPERATORS ! these ] Yay Select – read rows that satisfy some predicate Join – Equijoin with support for Inner and Outer join operators GroupBy – Group by some column A large operations → or OrderBy – Sorting the output muser → analytics Aggregations – COUNT, SUM, MAX etc. - - -

  10. ↳ LANGUAGE INTEGRATION C # R1 = SELECT A+C AS ac, B.Trim() AS B1 stdtib " # FROM R C# from Trim WHERE StringOccurs(C, “xyz”) > 2 function C # Custom I → inline #CS public static int StringOccurs(string str, string ptrn){ int cnt=0; int pos=-1; compiler while (pos+1 < str.Length) { # C pos = str.IndexOf(ptrn, pos+1); if (pos < 0) break; functions - defined cnt++; } User return cnt; uDFs } - #ENDCS -

  11. ↳ MAPREDUCE-LIKE? Rpf ! Yet ) inotnutpa to takes operator Lone UDF ← like Process map → reduce huoperator → ongroy# Reduce → l Combine → Rxwsety I pparciismediw ; - join COMBINE S1 WITH S2 ← equi - - ON S1.A==S2.A AND S1.B==S2.B AND S1.C==S2.C # ← www.F#ihon ← USING MultiSetDifference PRODUCE A, B, C columns 1. Commutative ? times \ , many produce multiple Wk if be run can combine Sl comb 52 152 gaff

  12. ⇐ EXECUTION: COMPILER - - SELECT query, COUNT() AS count Check syntax, resolve names - FROM "search.log" I USING LogExtractor Checks if columns have been defined ← ← GROUP BY query 2 HAVING count > 1000 Result: Internal parse tree . = - - ORDER BY count DESC; on ↳ smiter ÷ . compiler seamy J

  13. w :* :* : postman OPTIMIZER chunk every optimizer cost - based - . . Rewrite the query expression à lowest cost → itqie.gr?z Quite Examples: a > Removing unnecessary columns query ← 2110 only Pushing down selection predicates columns 't Pre-aggregating query ) query add 't ↳ combiner similar ↳ filtering quem before y grouping I 71000 C Also need to reason about partitioning . . I :> L 7 (See VLDBJ paper)

  14. m!EodEEuy! µ ;÷g : dnt " Mmm RUNTIME OPTIMIZATIONS a rack within all ⇒ Aff not agg Idiom bw racks have links some Hierarchical aggregation → do also | to they similar intermediate Locality-sensitive task placement → this ,fas ; spark IMR ¥ Grouping heuristics? partitions ) FI÷¥E¥ ↳ Default paper the [ in vague C # code * set automatically m * group BT l ) after ↳ binary

  15. ↳ SUMMARY, TAKEAWAYS Relational API Schema . → - Enables rich space of optimizations - Easy to use, integration with C# UDFS I Scope Execution - Compiler to check for errors, generate DAG - Optimizer to accelerate queries (static + dynamic) Precursor to systems like SparkSQL

  16. DISCUSSION https://forms.gle/hL8VJ6uSG7Lzm164A

  17. ↳ Consider you have a column-oriented data layout on your storage system (Example below). What are some reasons that a SCOPE query might be faster than running equivalent MR program? parquet Apache £ Robin offsets notion of Extractor qs wk forage EITI , → 9 I D 8 7 b g 5 - se . . . - → Ogletree Pre - filtering edpmfofihow via touches → query → the column in single as is easier this extractor MN is this well efficient http://dbmsmusings.blogspot.com/2017/10/apache-arrow-vs-parquet-and-orc-do-we.html

  18. ⇒ ⇒ Does SCOPE-like Optimizer help ML workloads? Consider the code in your Assignment2. What parts of your code would benefit and what parts would not? extraction ! yn ! in Joins workloads feature filtering Colum ML rare ? or optimization other a. adieu µ , , µ , , , ag# outputs intermediate caching details about Hash No = aopeit.IE : → dgjfjkn.gg " ? Dort merge optimizer → join

  19. NEXT STEPS Next class: Elastic Data Warehousing with SnowFlake Project proposals due tomorrow! See Piazza! Midterm coming up! " ÷ ÷ :*

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend