Julia A Fresh Approach to GPU Computing What is Julia? function - - PowerPoint PPT Presentation
Julia A Fresh Approach to GPU Computing What is Julia? function - - PowerPoint PPT Presentation
Julia A Fresh Approach to GPU Computing What is Julia? function mandel (z) c = z Technical computing language maxiter = 80 for n = 1:maxiter if abs(z) > 2 High-level like Python return n - 1 end z = z ^ 2 + c end Performance of C return
What is Julia?
Technical computing language High-level like Python Performance of C function mandel(z) c = z maxiter = 80 for n = 1:maxiter if abs(z) > 2 return n-1 end z = z^2 + c end return maxiter end
Julia for GPU programming
What is GPU programming?
High-level Low-level TensorFlow, Keras ArrayFire, Thrust cuBLAS, cuDNN CUB, MGPU CUDA C Flux.jl, Knet.jl GPUArrays.jl CuArrays.jl CUDAnative.jl
Programming with libraries
cuBLAS.jl cuFFT.jl cuRAND.jl cuSPARSE.jl cuDNN.jl cuSOLVER.jl CuArrays.jl a = CuArray(Float32,2) b = curand(Float32, 2) a*a fft(a) qrfact(a) softmax(a)
Programming with kernels
Much harder!
Designed for performance
Multiple dispatch
function foo(x) if isa(x, Int64) … elseif isa(x, Float64) … end end foo(x::Int64) = … foo(x::Float64) = …
Designed for performance
Type inference
function sigmoid(x) temp = exp(-x) return (1 / (1+temp)) end
Designed for performance
Type inference
function sigmoid(x::Int) temp = exp(-x)::Float64 return (1 / (1+temp))::Float64 end
Designed for performance
Type inference
function sigmoid(x::Float32) temp = exp(-x)::Float32 return (1 / (1+temp))::Float32 end
Machine-native types
Designed for performance
Multiple dispatch Type inference Machine-native types Specializing JIT compiler High-quality stand-alone machine code
Extensible language
Source AST Julia IR LLVM IR Machine code Inspect & Inject Configure Source
How does it look?
function vadd(a, b, c) i = threadIdx().x c[i] = a[i] + b[i] return end a = CuArray(randn(2,2)) b = CuArray(randn(2,2)) c = similar(a) @cuda threads=4 vadd(a,b,c) No DSL, no subset, just Julia CUDA abstraction level Performance parity
How does it run?
How does it work?
function vadd(a, b, c) i = threadIdx().x c[i] = a[i] + b[i] return end a = CuArray(randn(2,2)) b = CuArray(randn(2,2)) c = similar(a) @cuda threads=4 vadd(a,b,c)
vadd vadd CuArray{Float64,2} LLVM IR PTX
How does it work?
vadd LLVM IR PTX
Run time JIT compiler Fully transparent No overhead!
vadd CuArray{Float64,2} LLVM IR PTX vadd CuArray{Int32,3}
High-level GPU programming
( (a .+ b) ./ d ) .- e Great performance Clean & concise Generic code
function vadd(a, b, c) i = threadIdx().x c[i] = a[i] + b[i] return end W = randn(2, 10) b = randn(2) f(x) = softmax(W * x .+ b) model = Chain( Dense(10, 5, σ), Dense(5, 2), softmax) From GPU kernels, to differentiable algorithms, to high-level layer stacking. All on one platform.
Pkg.add(“Flux”)
Differential Equations Machine Learning CUDA Automatic Differentiation
Everything You Build
The Julia Magic
Everything just works with everything else!
All the HPC Tooling
Generic programming is extremely powerful. StructOfArrays.jl DistributedArrays.jl JuliaDB.jl & DataFrames.jl Deep Learning Differential Equations Operations Research
function model(tree) if isleaf(tree) tree.value else model(tree.left) + model(tree.right) end