Data Science Applications of GPUs in the R University of California - - PowerPoint PPT Presentation

data science applications of gpus in the r
SMART_READER_LITE
LIVE PREVIEW

Data Science Applications of GPUs in the R University of California - - PowerPoint PPT Presentation

Data Science Applications of GPUs in the R Language Norm Matloff Data Science Applications of GPUs in the R University of California at Language Davis GTC 2016 Norm Matloff University of California at Davis GTC 2016 April 7, 2016


slide-1
SLIDE 1

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Data Science Applications of GPUs in the R Language

Norm Matloff University of California at Davis GTC 2016 April 7, 2016 These slides at http://heather.cs.ucdavis.edu/GTC.pdf

slide-2
SLIDE 2

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Why R?

slide-3
SLIDE 3

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Why R?

  • The lingua franca for the data science community.

(R-Python-Julia battle looming?)

  • Statistically Correct: Written by statisticians, for

statisticians.

  • 8,000 CRAN packages!
  • Excellent graphics capabilities, including Shiny (easily

build your own interactive tool).

slide-4
SLIDE 4

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

R → GPU Link Pros and Cons

slide-5
SLIDE 5

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

R → GPU Link Pros and Cons

On the plus side:

  • Speed: R is an interpreted language. (Nick Ulle and

Duncan Temple Lang working on LLVM compiler.)

  • R is often used on large and/or complex data sets, thus

requiring large amounts of computation.

  • Much of R computation involves matrices or other
  • perations well-suited to GPUs.

On the other hand:

  • Big Data implies need for multiple kernel calls, and much

host/device traffic.

  • Ditto for R’s many iterative algorithms.
  • Many of the matrix ops are not embarrassingly parallel.
  • Unpacking and repacking into R object structure.
slide-6
SLIDE 6

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Disclaimers

slide-7
SLIDE 7

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Disclaimers

  • Talk is meant to be aimed at NVIDIA but otherwise

generic, not focusing on the latest/greatest model.

slide-8
SLIDE 8

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Disclaimers

  • Talk is meant to be aimed at NVIDIA but otherwise

generic, not focusing on the latest/greatest model.

  • Our running example, NMF, has the goal of illustrating

issues and methods concerning the R/GPU interface. It is not claimed to produce the fastest possible computation. (See talk by Wei Tan in this session.)

slide-9
SLIDE 9

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Running Example: Nonnegative Matrix Factorization (NMF)

slide-10
SLIDE 10

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Running Example: Nonnegative Matrix Factorization (NMF)

  • Have matrix A ≥ 0, rank r.
  • Want to find matrices W ≥ 0 and H ≥ 0 of rank s ≪ r

with A ≈ WH

  • Columns of W form a “pseudo-basis” for columns of A:

A.j is approximately a linear combination of the columns

  • f W , with coordinates in H.j.
slide-11
SLIDE 11

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Applications of NMF

slide-12
SLIDE 12

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Applications of NMF

  • Image compression.
slide-13
SLIDE 13

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Applications of NMF

  • Image compression.
  • Image classification.
slide-14
SLIDE 14

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Applications of NMF

  • Image compression.
  • Image classification. Each column of A is one image.
slide-15
SLIDE 15

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Applications of NMF

  • Image compression.
  • Image classification. Each column of A is one image. To

classify new image, find coordinates u w.r.t. W , then find nearest neighbor(s) of u in H.

slide-16
SLIDE 16

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Applications of NMF

  • Image compression.
  • Image classification. Each column of A is one image. To

classify new image, find coordinates u w.r.t. W , then find nearest neighbor(s) of u in H.

  • Text classification. Each column of A is one document,

with counts of words of interest. Similar to image classification.

slide-17
SLIDE 17

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example of R Calling C/C++

slide-18
SLIDE 18

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example of R Calling C/C++

  • Compare R’s NMF package to E. Battenberg’s

NMF-CUDA, on a 3430 × 512 A:

slide-19
SLIDE 19

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example of R Calling C/C++

  • Compare R’s NMF package to E. Battenberg’s

NMF-CUDA, on a 3430 × 512 A:

  • R, s = 10: 649.843 sec
  • GPU, s = 30: 0.986 sec
  • GPU solved a much bigger problem in much less time
  • Even though the R pkg is in C++, not R.
slide-20
SLIDE 20

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example of R Calling C/C++

  • Compare R’s NMF package to E. Battenberg’s

NMF-CUDA, on a 3430 × 512 A:

  • R, s = 10: 649.843 sec
  • GPU, s = 30: 0.986 sec
  • GPU solved a much bigger problem in much less time
  • Even though the R pkg is in C++, not R.
  • Solution: Call NMF-CUDA’s update div() from R.
slide-21
SLIDE 21

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example of R Calling C/C++

  • Compare R’s NMF package to E. Battenberg’s

NMF-CUDA, on a 3430 × 512 A:

  • R, s = 10: 649.843 sec
  • GPU, s = 30: 0.986 sec
  • GPU solved a much bigger problem in much less time
  • Even though the R pkg is in C++, not R.
  • Solution: Call NMF-CUDA’s update div() from R. BUT

HOW?

  • R’s Rcpp package makes interfacing R to C/C++ very

convenient and efficient.

slide-22
SLIDE 22

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

General R/GPU Tools

slide-23
SLIDE 23

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

General R/GPU Tools

What’s out there now for R/GPU:

  • gputools

(Buckner et al.) The oldest major package. Matrix multiply; matrix of distances between rows; linear model fit; QR decomposition; correlation matrix; hierarchical clustering.

  • HiPLAR

(Montana et al.) R wrapper for MAGMA and PLASMA. Linear algebra routines, e.g. Cholesky.

  • rpud

(Yau.) Similar to gputools, but has SVM.

  • Rth

(Matloff.) R interfaces to some various algorithms coded in Thrust. Matrix of distances between rows; histogram; column sums; Kendall’s Tau; contingency table.

slide-24
SLIDE 24

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Current Tools (cont’d.)

slide-25
SLIDE 25

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Current Tools (cont’d.)

  • gmatrix

(Morris.) Matrix multiply, matrix subsetting, Kronecker product, row/col sums, Hamiltonian MCMC, Cholesky.

  • RCUDA

(Baines and Temple Lang, currently not under active development.) Enables calling GPU kernels directly from

  • R. (Kernels still written in CUDA.)
  • rgpu

(Kempenaar, no longer under active development.) “Compiles” simple expressions to GPU.

  • various OpenCL interfaces

ROpenCL, gpuR. Similar to RCUDA, but via OpenCL interface.

slide-26
SLIDE 26

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example: Linear Regression Via gputools

slide-27
SLIDE 27

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example: Linear Regression Via gputools

> t e s t ← function (n , p ) { x ← matrix ( r u n i f ( n∗p ) , nrow=n ) r e g v a l s ← x %∗ % rep ( 1 . 0 , p ) y ← r e g v a l s + 0.2 ∗ r u n i f ( n ) xy ← cbind ( x , y ) p r i n t ( ” gputools method” ) p r i n t ( system . time (gpuLm . f i t ( x , y ) ) ) p r i n t ( ” o r d i n a r y method” ) p r i n t ( system . time (lm . f i t ( x , y ) ) ) } > t e s t (100000 ,1500) [ 1 ] ” gputools method” user system e l a p s e d 6.280 2.878 17.902 [ 1 ] ” o r d i n a r y method” user system e l a p s e d 142.282 0.669 142.912

slide-28
SLIDE 28

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Key Issue: Keeping Objects on the Device

slide-29
SLIDE 29

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Key Issue: Keeping Objects on the Device

  • Some packages, notably gputools, do not take arguments
  • n the device.
  • So, cannot store intermediate results on the device, thus

requiring needless copying.

  • Some packages remedy this, e.g. gmatrix.
slide-30
SLIDE 30

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example

slide-31
SLIDE 31

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example

l i b r a r y ( gputools ) l i b r a r y ( gmatrix ) n ← 5000 z ← matrix ( r u n i f ( n ˆ2) , nrow=n )

# p l a i n R :

system . time ( z %∗ % z %∗ % z )

# u s e r system e l a p s e d # 1 3 8 . 7 5 7 0 . 3 2 2 1 3 9 . 0 8 1

system . time ( gpuMatMult ( gpuMatMult ( z , z ) , z ))

# u s e r system e l a p s e d # 6 . 6 0 7 1 . 1 7 0 1 0 . 0 5 9

zm ← gmatrix ( z , nrow=n , ncol=n ) # zm2 ,

zm3 not shown

system . time ({gmm(zm , zm , zm2 ) ; gmm(zm , zm2 , zm3 )})

# u s e r system e l a p s e d # 6 . 2 5 8 1 . 0 3 1 7 . 2 8 5

slide-32
SLIDE 32

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Rth Example — Kendall’s Tau

slide-33
SLIDE 33

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Rth Example — Kendall’s Tau

A kind of correlation measure, defined to be the proportion of concordant pairs: (Xi, Yi) and (Xj, Yj) are concordant if sign(Xi − Xj) · sign(Yi − Yj) > 0

slide-34
SLIDE 34

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Kendall’s Tau (cont’d.)

slide-35
SLIDE 35

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Kendall’s Tau (cont’d.)

R wrapper to Thrust call:

r t h k e n d a l l ← function ( x , y ) { dyn . load ( ” r t h k e n d a l l . so ” ) n ← length ( x ) tmp ← .C( ” r t h k e n d a l l ” , as . s i n g l e ( x ) , as . s i n g l e ( y ) , as . integer ( n ) , tmpres=s i n g l e (1) ,DUP =dupval ) return (tmp$tmpres ) }

slide-36
SLIDE 36

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Kendall’s Tau (cont’d)

slide-37
SLIDE 37

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Kendall’s Tau (cont’d)

void r t h k e n d a l l ( f l o a t ∗x , f l o a t ∗y , i n t ∗ nptr , f l o a t ∗ t a u p t r ) { i n t n = ∗ nptr ; t h r u s t : : counting i t e r a t o r <int > seqa ( 0 ) ; t h r u s t : : counting i t e r a t o r <int > seqb = seqa + n−1; // dx , dy , tmp d e c l a r a t i o n s not shown t h r u s t : : transform ( seqa , seqb , tmp . begin ( ) , c a l c g t i ( dx , dy , n ) ) ; i n t totcount = t h r u s t : : reduce (tmp . begin ( ) , tmp . end ( ) ) ; f l o a t n p a i r s = n ∗ (n−1) / 2; ∗ t a u p t r = ( totcount − ( npairs −totcount )) / n p a i r s }

slide-38
SLIDE 38

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Kendall’s Tau (cont’d)

slide-39
SLIDE 39

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Kendall’s Tau (cont’d)

s t r u c t c a l c g t i { // handle 1 i , a l l j > i // more d e c l a r a t i o n s not shown c a l c g t i ( f l o u b l e v e c dx , f l o u b l e v e c dy , i n t n ) : dx ( dx ) , dy ( dy ) , n ( n ) { wdx = t h r u s t : : raw p o i n t e r c a s t (&dx [ 0 ] ) ; wdy = t h r u s t : : raw p o i n t e r c a s t (&dy [ 0 ] ) ; } d e v i c e i n t

  • p e r a t o r ( ) ( i n t

i ) { f l o u b l e x i = wdx [ i ] , y i = wdy [ i ] ; i n t j , count=0; for ( j = i +1; j < n ; j++) count += ( ( x i − wdx [ j ] ) ∗ ( y i − wdy [ j ] ) > 0 ) ; return count ; } };

slide-40
SLIDE 40

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example: NMF Again

slide-41
SLIDE 41

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example: NMF Again

  • The R NMF package, and NMF-CUDA use

multiplicative update methods.

  • For instance, for Frobenius norm,

W ← W ◦ AH′ WHH′ and similarly for H.

slide-42
SLIDE 42

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example: NMF Again

  • The R NMF package, and NMF-CUDA use

multiplicative update methods.

  • For instance, for Frobenius norm,

W ← W ◦ AH′ WHH′ and similarly for H.

  • Another possibility is to use the alternating least squares

method:

  • In odd-numbered iterations, regress each col. of A against
  • cols. of W , yielding the columns of H. Mult. update even

better suited to GPUs.

  • In even-numbered iterations, reverse the roles of W and H

(and now with rows).

  • As seen earlier, least-squares estimation can be done fairly

well on GPUs.

slide-43
SLIDE 43

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

RCUDA Example: Normal Density

slide-44
SLIDE 44

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

RCUDA Example: Normal Density

Basic goal: Call CUDA kernels from R without burdening the R programmer with details of configuring grids, allocating device memory, copying between host and device, etc. Kernel:

e x t e r n ”C” g l o b a l void dnorm k e r n e l ( f l o a t ∗ vals , i n t n , f l o a t mu, f l o a t s i g ) { i n t myblock = b l o c k I d x . x + b l o c k I d x . y ∗ gridDim . x ; i n t b l o c k s i z e = blockDim . x ∗ blockDim . y ∗ blockDim . z ; i n t subthread = t h r e a d I d x . z∗ ( blockDim . x ∗ blockDim . y ) + t h r e a d I d x . y∗blockDim . x + t h r e a d I d x . x ; i n t i d x = myblock ∗ b l o c k s i z e + subthread f l o a t std = ( v a l s [ i d x ] − mu)/ s i g ; f l o a t e = exp ( − 0.5 ∗ std ∗ std ) ; v a l s [ i d x ] = e / ( s i g ∗ sqrt (2 ∗ 3 . 1 4 1 5 9 ) ) ; }

slide-45
SLIDE 45

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

RCUDA (cont’d.)

slide-46
SLIDE 46

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

RCUDA (cont’d.)

n = 1e6 mean = 2.3 sd = 2.1 x = rnorm (n , mean , sd )

# e v a l d e n s i t y at a l l p t s i n x

m = loadModule ( ”dnorm . ptx ” ) k = m$dnorm k e r n e l ans = . cuda (k , x , n , mean , sd , gridDim = c (62 , 32) , blockDim = 512)

slide-47
SLIDE 47

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Helpful Utilities

slide-48
SLIDE 48

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Helpful Utilities

  • Rcpp
  • Greatly facilitates calling C/C++ from R.
  • Base R offers functions .C() and .Call(). The former is

inefficient and the latter requires knowledge of R internals.

  • Rcpp makes it easy.
  • bigmemory
  • R currently not completely 64-bit.
  • Can have 52-bit integers, but only 32-bit matrix row/col

dimensions.

  • The bigmemory package allows storing R matrices in “C

land,” circumventing R storage limits.

  • Storage is in shmem, thus allowing for multicore use

Rdsm).

slide-49
SLIDE 49

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Software Alchemy

slide-50
SLIDE 50

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Software Alchemy

  • For “statistical” problems, in “iid” form.
slide-51
SLIDE 51

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Software Alchemy

  • For “statistical” problems, in “iid” form. Image, text

classification work.

slide-52
SLIDE 52

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Software Alchemy

  • For “statistical” problems, in “iid” form. Image, text

classification work.

  • Simple idea:
  • Break data into “independent” chunks.
  • Apply the procedure, e.g. logistic regression, to each

chunk.

  • Use combining op, e.g. averaging, for final answer.
  • Provably correct and efficient.
slide-53
SLIDE 53

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Software Alchemy

  • For “statistical” problems, in “iid” form. Image, text

classification work.

  • Simple idea:
  • Break data into “independent” chunks.
  • Apply the procedure, e.g. logistic regression, to each

chunk.

  • Use combining op, e.g. averaging, for final answer.
  • Provably correct and efficient.
  • A variant: Apply procedure to chunks but take combining
  • p to be concatenation them rather than averaging.
slide-54
SLIDE 54

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Serial Benefits of Software Alchemy

slide-55
SLIDE 55

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Serial Benefits of Software Alchemy

  • SA gives speedup even in serial case of task is O(nc) for

c > 1

  • Use SA to address a common problem: Big data, small

GPU memory.

slide-56
SLIDE 56

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Serial Benefits of Software Alchemy

  • SA gives speedup even in serial case of task is O(nc) for

c > 1

  • Use SA to address a common problem: Big data, small

GPU memory. Apply GPU to each chunk, serially, then run combining op.

slide-57
SLIDE 57

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Serial Benefits of Software Alchemy

  • SA gives speedup even in serial case of task is O(nc) for

c > 1

  • Use SA to address a common problem: Big data, small

GPU memory. Apply GPU to each chunk, serially, then run combining op.

slide-58
SLIDE 58

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example: NMF

slide-59
SLIDE 59

Data Science Applications

  • f GPUs in

the R Language Norm Matloff University of California at Davis GTC 2016

Example: NMF

  • E.g. break rows or columsn into m chunks.
  • Get approximation WH for each one.
  • To predict new case:
  • Get the m predictions.
  • Combine via voting.