in Spark Using GPU Minsik Cho, Rajesh Bordawekar IBM TJW Research - - PowerPoint PPT Presentation

in spark using gpu
SMART_READER_LITE
LIVE PREVIEW

in Spark Using GPU Minsik Cho, Rajesh Bordawekar IBM TJW Research - - PowerPoint PPT Presentation

Accelerating Cross-Validation in Spark Using GPU Minsik Cho, Rajesh Bordawekar IBM TJW Research 1 Cross-Validation 101 [Wikipedia] Popular Model Validation Technique to avoid overfitting, for better generalization useful when not


slide-1
SLIDE 1

Minsik Cho, Rajesh Bordawekar IBM TJW Research

Accelerating Cross-Validation in Spark Using GPU

1

slide-2
SLIDE 2

Cross-Validation 101

2

[Wikipedia]

  • Popular Model Validation Technique

– to avoid overfitting, for better generalization – useful when not enough dataset

slide-3
SLIDE 3

Cross-Validation + Elastic Net Regression

3

  • Cross Validation is popularly used with

– Linear/Logistic Regression – Elastic Net Regularization

  • A large number of problems to solve

– #fold from cross-validation – various lambdas to find the best prediction model – 4 fold x 1000 lambdas = 4000 regressions to fit

[Wikipedia]

Tons of problems to crunch

slide-4
SLIDE 4

Apache Spark Overview

4

  • In-memory engine for large-scale distributed data processing

– Used in database, streaming, machine/deep learning, graph processing – Support high-level APIs in Java, Scala, Python and R

  • RDD: resilient distributed datasets

– Partitioned collection of records – Spread across the cluster – Caching dataset in memory

slide-5
SLIDE 5

Spark GPU Acceleration

5

[Rajesh, oreilly.com]

  • Accelerated Compute-Intensive Workload with GPUs
slide-6
SLIDE 6

Cross-Validation in Spark

6

  • For each problem

– Create RDD – Distribute RDD – Call optimizer – Return Model

[Berkeley]

slide-7
SLIDE 7

Cross-Validation in Spark

7

Is this best for GPU?

Dataset Dataset i Dataset j Dataset k worker i worker j worker k One Model

Reduce Partitioned RDD

slide-8
SLIDE 8

Proposed Cross-Validation in Spark Using GPU

8

  • Broadcast Data

– Cross-Validation reuses the same mother dataset

  • RDD of problem instances, not DATA

– Tons of problems with different folding/lambdas

  • Maximize GPU stream to minimize down-time
slide-9
SLIDE 9

Cross-Validation in Spark Using GPU

9

Dataset Problems Dataset Dataset Dataset worker i worker j worker k

Broadcasted as Array

slide-10
SLIDE 10

Cross-Validation in Spark Using GPU

10

Dataset Dataset Dataset worker i worker j worker k

Distributed as RDD

Problems Problems i Problems j Problems k

  • Problems in RDD
slide-11
SLIDE 11

Code Snippet

11

Build a problem set

slide-12
SLIDE 12

Code Snippet (cont.)

12

Input: dataset, problems Dataset broadcast Problem RDD

slide-13
SLIDE 13

Cross-Validation in Spark Using GPU

13

Dataset worker i Problems i Dataset fold 0 Dataset fold 1 Dataset fold 2 Dataset fold 3 GPU0 GPU1 Problem a:0 Problem a:2 Problem a:1 Problem a:3

cudaStream cudaStream cudaStream cudaStream

13

Problem b:0 Problem b:2 Problem b:1 Problem b:3

slide-14
SLIDE 14

Cross-Validation in Spark Using GPU

14

Dataset Dataset Dataset worker i worker j worker k

Distributed as RDD

Problems Problems i Problems j Problems k All Models

Reduce

slide-15
SLIDE 15

Cross-Validation in Spark Using GPU (Advantages)

15

  • Dataset Broadcast

– Efficient p2p protocol in Spark – One-time upfront overhead – Data reused within GPUs

  • Problem RDD

– No communication among workers – Multiple streams to maximize GPU utilization

  • Multi-level parallelism

– Functional parallelism from Problem RDD – Multiple GPUs – Multiple cudaStreams

slide-16
SLIDE 16

Experimental Results

16

  • System

– 2 node cluster – Each node with thirty two x86 cores – Each node with two K40ms

  • Software

– Spark 2.0 – OpenJDK 1.8

  • Workload

– Real Watson Health dataset – 5 fold cross validation – 1024 lambda exploration

  • Algorithms

– Logistic regression – Linear regression

  • Measured e2e runtime including dataset broadcast
slide-17
SLIDE 17

Result: GPU utilization

17

  • Sustained over 97% Multi-GPU utilization
slide-18
SLIDE 18

Result: Logistic Regression

No help : 2 problems Help : enough problems Help : more problems 114x speedup

slide-19
SLIDE 19

Result: Linear Regression

94x speedup

slide-20
SLIDE 20
  • Cross-Validation on Spark using GPU

– New way of parallelization in Spark

  • Broadcast dataset
  • RDDmized problems

– Reduce communication

  • About 100x speedup for Logistic/Linear Regression + Elastic Net
  • Future work

– Support out of core execution

20

Conclusion