CS 744: SCOPE Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - - PowerPoint PPT Presentation

cs 744 scope
SMART_READER_LITE
LIVE PREVIEW

CS 744: SCOPE Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - - PowerPoint PPT Presentation

Hello ! CS 744: SCOPE Shivaram Venkataraman Fall 2020 ADMINISTRIVIA Thursday - Assignment grades this week Single PDF file next - Midterm details on Piazza - Course Project Proposal Submission a convert I ppf photo Hot


slide-1
SLIDE 1

CS 744: SCOPE

Shivaram Venkataraman Fall 2020

Hello!

slide-2
SLIDE 2

ADMINISTRIVIA

  • Assignment grades this week
  • Midterm details on Piazza
  • Course Project Proposal Submission

next

Thursday

Single PDF file

I

photo

a convert

ppf

Hot CRP

↳ Peer

review

Anonymous

ppf upload

don't include your

names

Only

include

them in

HotCRP

itself

slide-3
SLIDE 3

Scalable Storage Systems Datacenter Architecture Resource Management Computational Engines Machine Learning SQL Streaming Graph Applications

Pytoiphipep

rear

upper

←I ✓

→ MapReduce

Ray

spark

slide-4
SLIDE 4

SQL: STRUCTURED QUERY LANGUAGE

I

language

to

query

a

database

slide-5
SLIDE 5

DATABASE SYSTEMS

Sou

÷:

: ..

  • '

OLAP

  • m
  • .
  • O LTP

t

Transaction

processing

Airline

.
  • reservation
slide-6
SLIDE 6

PROCEDURAL VS. RELATIONAL

SELECT COUNT(*) FROM “users” WHERE age < 21 lines = sc.textFile(“users") csv = lines.map(x => x.split(‘,’)) young = csv.filter(x => x(1) < 21) println(young.count())

artie

schema

^

tendered

\

data great:b:! !

  • ' "

Esv

)

← Men

. .

  • an age

Ekin:&

:c.

!÷÷.int

'

÷

:*

easy

ftp.ograrre

"

slide-7
SLIDE 7

SCOPE

SELECT query, COUNT(*) AS count FROM "search.log" USING LogExtractor GROUP BY query HAVING count > 1000 ORDER BY count DESC;

→ Microsoft
  • Submit

r

to

hang

÷.

Motl

slide-8
SLIDE 8

SCOPE OPERATORS

Input reading: What is different? EXTRACT column[:<type>] [, ...] FROM <input_stream(s) > USING <Extractor> [(args)] [HAVING <predicate>]

powiat

x RDD

A

① asthma

information

?

us

.

so

. text File
  • # - filenames

X

class

  • r

② pluggable

function

csr Extractor

" pwndoiirb

& furring

geqrad-wgv.in?:::M:.ev::;.:ia

:p.com?l

.
slide-9
SLIDE 9

SQL OPERATORS

Select – read rows that satisfy some predicate Join – Equijoin with support for Inner and Outer join GroupBy – Group by some column OrderBy – Sorting the output Aggregations – COUNT, SUM, MAX etc.

]

Yay

! these

  • perators

A large

  • r

muser

analytics

  • perations
slide-10
SLIDE 10

LANGUAGE INTEGRATION

R1 = SELECT A+C AS ac, B.Trim() AS B1 FROM R WHERE StringOccurs(C, “xyz”) > 2 #CS public static int StringOccurs(string str, string ptrn){ int cnt=0; int pos=-1; while (pos+1 < str.Length) { pos = str.IndexOf(ptrn, pos+1); if (pos < 0) break; cnt++; } return cnt; } #ENDCS C#

"

#

Trim

from

C#

stdtib

I

→ inline

Custom

C#

function

C

#

compiler

User

  • defined

functions

uDFs

slide-11
SLIDE 11

MAPREDUCE-LIKE?

Process Reduce Combine

COMBINE S1 WITH S2 ON S1.A==S2.A AND S1.B==S2.B AND S1.C==S2.C USING MultiSetDifference PRODUCE A, B, C

map

← like
  • perator Lone UDF
to takes

inotnutpa

Rpf!Yet)

reduce huoperator→ongroy#

→Rxwsety

  • l

pparciismediw;

I

← equi

  • join

#← www.F#ihon

  • 1. Commutative?

\,

produce

many

columns

Wk if

combine

can

be

run

multiple

times

Sl comb 52 152 gaff

slide-12
SLIDE 12

EXECUTION: COMPILER

SELECT query, COUNT() AS count FROM "search.log" USING LogExtractor GROUP BY query HAVING count > 1000 ORDER BY count DESC; Check syntax, resolve names Checks if columns have been defined Result: Internal parse tree

  • I

2

.

=

  • ÷

↳ smiter

.

  • n

compiler

seamy J

slide-13
SLIDE 13

OPTIMIZER

Rewrite the query expression à lowest cost Examples: Removing unnecessary columns Pushing down selection predicates Pre-aggregating Also need to reason about partitioning (See VLDBJ paper)

w:*:*

postman

:

chunk

  • cost - based

every optimizer

. .

itqie.gr?z

Quite

a >

  • nly
2110

query

columns

query

't

query

)

↳ combiner similar

↳ filtering

before

add't

y

quem

grouping

I

71000 C

.

L

7

. I:>
slide-14
SLIDE 14

RUNTIME OPTIMIZATIONS

Hierarchical aggregation Locality-sensitive task placement Grouping heuristics?

Mmm

;÷g: dnt "

m!EodEEuy!µ

not

all

have

some

bw

⇒ Aff

within

a rack

links

agg Idiom

racks

similar

to

spark IMR

they

also

do

|

this,fas;

intermediate

¥

[

*

vague

in

the

paper

↳ Default

C # code

FI÷¥E¥

automatically

set

m

partitions)

*

after

group BT l )

↳ binary

slide-15
SLIDE 15

SUMMARY, TAKEAWAYS

Relational API

  • Enables rich space of optimizations
  • Easy to use, integration with C#

Scope Execution

  • Compiler to check for errors, generate DAG
  • Optimizer to accelerate queries (static + dynamic)

Precursor to systems like SparkSQL

Schema .

UDFS I
slide-16
SLIDE 16

DISCUSSION

https://forms.gle/hL8VJ6uSG7Lzm164A

slide-17
SLIDE 17

Consider you have a column-oriented data layout on your storage system (Example below). What are some reasons that a SCOPE query might be faster than running equivalent MR program?

http://dbmsmusings.blogspot.com/2017/10/apache-arrow-vs-parquet-and-orc-do-we.html

£

Apache

parquet

qs

notion of

Robin

  • ffsets

Extractor

wk

,

forage

EITI

5

b

7

8

g I 9
  • se
. . .

D

  • edpmfofihow

→ Pre -filtering

→ Ogletree

via

is easier

→ query

touches

single

column

in

the

this

is

MN

extractor

this

as

efficient

well

slide-18
SLIDE 18

Does SCOPE-like Optimizer help ML workloads? Consider the code in your

  • Assignment2. What parts of your code would benefit and what parts would not?

Colum

filtering

feature

extraction !

Joins in ML

workloads

rare ?
  • r
  • ther
  • ptimization

,

µ,,µ,

⇒ a.adieu

,

yn!

ag#

No

details

about

caching intermediate

  • utputs

Hash →

  • ptimizer
= aopeit.IE:

" ?

dgjfjkn.gg

Dort merge

join

slide-19
SLIDE 19

NEXT STEPS

Next class: Elastic Data Warehousing with SnowFlake Project proposals due tomorrow! See Piazza! Midterm coming up!

÷

"

÷:*