weld a common runtime for data analytics
play

Weld: A Common Runtime for Data Analytics Shoumik Palkar, James - PowerPoint PPT Presentation

Weld: A Common Runtime for Data Analytics Shoumik Palkar, James Thomas, Anil Shanbhag*, Deepak Narayanan, Malte Schwarzkopf*, Holger Pirk*, Saman Amarasinghe*, Matei Zaharia Stanford InfoLab, *MIT CSAIL Motivation Modern data apps combine many


  1. Weld: A Common Runtime for Data Analytics Shoumik Palkar, James Thomas, Anil Shanbhag*, Deepak Narayanan, Malte Schwarzkopf*, Holger Pirk*, Saman Amarasinghe*, Matei Zaharia Stanford InfoLab, *MIT CSAIL

  2. Motivation Modern data apps combine many disjoint processing libraries & functions » Relational, statistics, machine learning, … » E.g. PyData stack + Great results leveraging work of 1000s of authors – No optimization across these functions

  3. How Bad is This Problem? Growing gap between memory/processing makes traditional way of combining functions worse parse_csv data = pandas.parse_csv(string) filtered = pandas.dropna(data) dropna avg = numpy.mean(filtered) mean 5-30x slowdowns in NumPy, Pandas, TensorFlow, etc

  4. How We Solve This machine graph … SQL learning algorithms Common Runtime … CPU GPU

  5. How We Solve This machine graph … SQL learning algorithms Runtime API Weld Weld IR runtime Optimizer Backends … CPU GPU

  6. Runtime API Uses lazy evaluation to collect work across libraries User Application Weld Runtime f1 data = lib1.f1() IR fragments lib2.map(data, map for each function item => lib3.f2(item) f2 ) Runtime API Combined IR program Optimized Data in 1101110 0111010 machine code application 1101111

  7. Weld IR Designed to meet three goals: 1. Library composition: support complete workloads such as nested parallel calls 2. Ability to express optimizations: e.g. loop fusion, vectorization, loop tiling 3. Explicit parallelism

  8. Weld IR Small, powerful design inspired by “monad comprehensions” Parallel loops: iterate over a dataset Builders: declarative objects for producing results » E.g. append items to a list, compute a sum » Can be implemented differently on different hardware Captures relational algebra, functional APIs like Spark, linear algebra, and composition thereof

  9. Examples Implement functional operators using builders def map(data, f): builder = new vecbuilder[int] for x in data: merge (builder, f(x)) result (builder) def reduce(data, zero, func): builder = new merger[zero, func] for x in data: merge (builder, x) result (builder)

  10. Example Optimization: Fusion squares = map (data, x => x * x) sum = reduce (data, 0, +) bld1 = new vecbuilder[int] bld2 = new merger[0, +] for x in data: merge (bld1, x * x) merge (bld2, x) Loops can be merged into one pass over data

  11. Implementation Prototype with APIs in Scala and Python » LLVM and Voodoo for code gen Integrations: TensorFlow, NumPy, Pandas, Spark

  12. Results: Individual Workloads SQL (TPC-H) PageRank 12 1.2 0.7 Runtime [secs] GraphMat Runtime [secs] Runtime [secs] 0.6 10 1 Hand-opt 0.5 8 0.8 Weld 0.4 0.6 6 0.3 0.4 4 0.2 0.2 2 0.1 0 0 0 1 4 12 1 4 12 1 2 4 8 12 Number of threads Number of threads Number of threads HyPer Weld HyPer Weld H.o. H.o. Q1 Q3 Word2Vec 0.3 0.6 Runtime [secs] Runtime [secs] 0.25 0.5 25 0.2 TF 0.4 Runtime [secs] 20 TF-Op 0.15 0.3 Weld 15 0.1 0.2 0.05 0.1 10 0 0 1 4 12 1 4 12 5 Number of threads Number of threads 0 HyPer Weld HyPer Weld TF-Op = C++ operator H.o. H.o. Q6 Q12

  13. Results: Existing Frameworks 45 1000 0.2 SparkSQL 0.18 TF Runtime [secs; log10] 40 Runtime [secs] Runtime [secs] 0.16 Weld Hand-opt 35 100 0.14 Weld 30 0.12 25 0.1 10 0.08 20 0.06 15 0.04 1 10 0.02 0 5 0.1 0 1 Core 12 Cores LR (1T) LR (12T) TPC-H Q1 TPC-H Q6 NP Weld Workload Workload NExpr TPC-H Vector Sum Logistic Regression Integration effort: 500 lines glue, 30 lines/operator

  14. Results: Cross-Library Optimization Pandas + NumPy Spark SQL UDF 100 2.0 Current Scala UDF Weld, no CLO Weld Weld, CLO 10 1.5 Runtime (sec, log10) Runtime (sec) Weld, 12 core 31x 1 1.0 290x 0.1 0.5 14x 0.01 0.0

  15. Conclusion The way we compose software will have to change to efficiently use modern hardware Weld is our first attempt at such a design – lots of open questions! » Optimization, specialized hardware, domain info, … Open source: this spring We’re hiring! (postdocs)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend