cs 744 split annotations
play

CS 744: SPLIT ANNOTATIONS Shivaram Venkataraman Fall 2020 - PowerPoint PPT Presentation

! welcome CS 744: SPLIT ANNOTATIONS Shivaram Venkataraman Fall 2020 ADMINISTRIVIA Course Project Checkins due tomorrow! Hot CRP In-class project presentations Dec 8 th and Dec 10 th presentation 4 slot min Sign up sheet on


  1. ! welcome CS 744: SPLIT ANNOTATIONS Shivaram Venkataraman Fall 2020

  2. ADMINISTRIVIA Course Project Checkins – due tomorrow! Hot CRP → In-class project presentations Dec 8 th and Dec 10 th presentation 4 slot min Sign up sheet on Piazza a 5 min → @ LA + a min upload slides

  3. ↳ computing computing semesters cloud maintain and compose efficiency j NEW HARDWARE and data MODELS

  4. ⇒ SETTING options workload ✓ Intel ) pricing MKL Multi-core machines // inputs are double arrays with `len` elems vdLog1p(len, d1, d1);// d1 = log(d1) Multiple functions and libraries - vdAdd(len, d1, tmp, d1);// d1 = d1 + tmp . // d1 = d1 / vol_sqrt scope - vdDiv(len, d1, vol_sqrt, d1); ↳ optimizes movement is Data a) across all within a operators expensive even d1 machine - Cpu → TVM larger is Arrays I data ↳ layers of ④ de streaming . PNN cache them to DRAM & writes reads - spark - ↳ cake if data fits in memory

  5. COMPILER-BASED APPROACHES want → - we be → to - - here ! Replace every library call to emit Kvm ) intermediate representation (IR) loop fusion rich Compile all the IR together Existing ← pipelining Nunley , libraries Lots of code change required! - g- Pandas - - -

  6. GOALS Provide data movement optimizations across libraries intrusive be very Require minimal or no changes to existing libraries not → Leverage existing hand-tuned code for speedups - I I matrix FFT multiply

  7. APPROACH split execution Build (1) earhex . nu graph ti . splits sized cache 14 pass d1 = price * strike → - - function d1 = np.log2(d1) + strike to every - - -

  8. SPLIT ANNOTATIONS library Given easier to provide a types data ↳ fewer code changing than . @splittable( than - size: SizeSplit(size), a: ArraySplit(size), operators a IT ex - • # mut out: ArraySplit(size)) = - void vdLog1p(long size, double*a, double*out) - ← 'T pipeline these size " ] a :[ can you Vd Scale ( long functions double * a) int scalar , size , y ' ::::÷i÷ : Split types: N ⟨ V0...Vn ⟩ e.g,: ArraySplit ⟨ 10, 2 ⟩ for 10 element array, 2 pieces - . . . - Split annotation: Name and split type to each argument and return value out " ) expert fashion same the as is split in output

  9. ⇒ IMPLEMENTING SPLIT API same shares . If Arrays flitter > data can you split type → pipeline safely Parameters ) ⇒ split ( double intend start * a pipeline cannot , . If you return results at prior merge function @splittable(m:MatrixSplit(m, axis), axis:_) next call -> ReduceSplit(axis) ,[ vector sumReduceToVector(matrix m, int axis); → log , multiply > Eg dog , multiply , #D!m ⑦ Reduces Hit imide implemented operation eye → partial outputs - combine to class

  10. MOZART DESIGN execution this → Capture graph I II - this evaluate → lazily opportunity maximum graph , IT to pipeline

  11. PYTHON CLIENT LIBRARY p Already exists Writing Annotations: Function decorators ] @sa((DataFrameSplit(), DataFrameSplit()), {}, DataFrameSplit()) - def divide(series, value): Pandas library calls somebody Capturing the graph 1 If Wraps original Python function and registers in graph be " divide can , by decorator - Returns a Future object → ( Ray , Pywren ) intercepted - constructed Graph is Evaluation Points internally Lazily evaluate by overriding __getattribute__ oral internally Future [ Data frame ] do : print ( Io ) the → result call print the and on .

  12. MOZART RUNTIME 'm :3 :D It :S ? Take dataflow graph à execution plan 'm Series of stages each stage split, pipeline and merge w . . . . . . e. are . - merge split - pipeline Choosing a batch size Set number of elements per batch using L2 cache size will fit cache number of elements that in L2 . compute

  13. SUMMARY workload Iterative Applications compose data processing libraries add ↳ will Data movement is bottleneck on multi-core machines to graph stages Key idea: Split and pipeline data across functions ↳ pipeline across iterations ? Split Annotations to reduce programmer effort Mozart: Client library and runtime for lazy evaluation

  14. DISCUSSION https://forms.gle/F2LJ21qFkBGWyypB7

  15. ↳ How does the dataflow graph that is executed by Mozart compare to dataflow graphs we have seen in other systems like Spark/PyT orch etc. Similarities Differences execution hazy tolerance is → Fault → objective not the dependencies narrow → checkpoint 'ng = pipelined → No by Mozart . black bones Functions are → . shuffling can't pick merging us → optimal join 3.7¥ ' operator , →

  16. Mhienednhfthreads increase men bandwidth two ?e' " " . expensive ✓ comp , mid for add - - 7 - speedier exp e - n Ix ' → - I - more having threads can intensive compute ⇒ not functions leed E mem speed up bottleneck how much

  17. NEXT STEPS Next class: TPU Project check-ins on HotCRP!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend