Weld: A Common Runtime for Data Analytics
Shoumik Palkar, James Thomas, Anil Shanbhag*, Deepak Narayanan, Malte Schwarzkopf*, Holger Pirk*, Saman Amarasinghe*, Matei Zaharia
Stanford InfoLab, *MIT CSAIL
Weld: A Common Runtime for Data Analytics Shoumik Palkar, James - - PowerPoint PPT Presentation
Weld: A Common Runtime for Data Analytics Shoumik Palkar, James Thomas, Anil Shanbhag*, Deepak Narayanan, Malte Schwarzkopf*, Holger Pirk*, Saman Amarasinghe*, Matei Zaharia Stanford InfoLab, *MIT CSAIL Motivation Modern data apps combine many
Shoumik Palkar, James Thomas, Anil Shanbhag*, Deepak Narayanan, Malte Schwarzkopf*, Holger Pirk*, Saman Amarasinghe*, Matei Zaharia
Stanford InfoLab, *MIT CSAIL
data = pandas.parse_csv(string) filtered = pandas.dropna(data) avg = numpy.mean(filtered)
parse_csv dropna mean
CPU GPU
CPU GPU
data = lib1.f1() lib2.map(data, item => lib3.f2(item) )
User Application Weld Runtime
Combined IR program Optimized machine code
1101110 0111010 1101111
IR fragments for each function Runtime API
f1 map f2
Data in application
def map(data, f): builder = new vecbuilder[int] for x in data: merge(builder, f(x)) result(builder) def reduce(data, zero, func): builder = new merger[zero, func] for x in data: merge(builder, x) result(builder)
squares = map(data, x => x * x) sum = reduce(data, 0, +) bld1 = new vecbuilder[int] bld2 = new merger[0, +] for x in data: merge(bld1, x * x) merge(bld2, x)
SQL (TPC-H) PageRank
2 4 6 8 10 12 1 2 4 8 12 Runtime [secs] Number of threads
GraphMat Hand-opt Weld
Word2Vec
Q1 Q3 Q6 Q12
0.2 0.4 0.6 0.8 1 1.2 1 4 12
Runtime [secs] Number of threads HyPer H.o. Weld
0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 4 12
Runtime [secs] Number of threads HyPer H.o. Weld
0.05 0.1 0.15 0.2 0.25 0.3 1 4 12
Runtime [secs] Number of threads HyPer H.o. Weld
0.1 0.2 0.3 0.4 0.5 0.6 1 4 12
Runtime [secs] Number of threads HyPer H.o. Weld 5 10 15 20 25 Runtime [secs]
TF TF-Op Weld
TF-Op = C++ operator
TPC-H Logistic Regression Vector Sum
5 10 15 20 25 30 35 40 45 TPC-H Q1 TPC-H Q6
Runtime [secs] Workload
SparkSQL Weld
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 Runtime [secs] NP NExpr Weld
0.1 1 10 100 1000 LR (1T) LR (12T)
Runtime [secs; log10] Workload
TF Hand-opt Weld
1 Core 12 Cores
0.01 0.1 1 10 100 Runtime (sec, log10)
Current Weld, no CLO Weld, CLO Weld, 12 core
290x 31x
0.0 0.5 1.0 1.5 2.0 Runtime (sec)
Scala UDF Weld
14x