Julia A Fresh Approach to GPU Computing What is Julia? function - - PowerPoint PPT Presentation

julia
SMART_READER_LITE
LIVE PREVIEW

Julia A Fresh Approach to GPU Computing What is Julia? function - - PowerPoint PPT Presentation

Julia A Fresh Approach to GPU Computing What is Julia? function mandel (z) c = z Technical computing language maxiter = 80 for n = 1:maxiter if abs(z) > 2 High-level like Python return n - 1 end z = z ^ 2 + c end Performance of C return


slide-1
SLIDE 1

Julia

A Fresh Approach to GPU Computing

slide-2
SLIDE 2

What is Julia?

Technical computing language High-level like Python Performance of C function mandel(z) c = z maxiter = 80 for n = 1:maxiter if abs(z) > 2 return n-1 end z = z^2 + c end return maxiter end

slide-3
SLIDE 3

Julia for GPU programming

slide-4
SLIDE 4

What is GPU programming?

High-level Low-level TensorFlow, Keras ArrayFire, Thrust cuBLAS, cuDNN CUB, MGPU CUDA C Flux.jl, Knet.jl GPUArrays.jl CuArrays.jl CUDAnative.jl

slide-5
SLIDE 5

Programming with libraries

฀ cuBLAS.jl ฀ cuFFT.jl ฀ cuRAND.jl ฀ cuSPARSE.jl ฀ cuDNN.jl ฀ cuSOLVER.jl CuArrays.jl a = CuArray(Float32,2) b = curand(Float32, 2) a*a fft(a) qrfact(a) softmax(a)

slide-6
SLIDE 6

Programming with kernels

Much harder!

slide-7
SLIDE 7

Designed for performance

Multiple dispatch

function foo(x) if isa(x, Int64) … elseif isa(x, Float64) … end end foo(x::Int64) = … foo(x::Float64) = …

slide-8
SLIDE 8

Designed for performance

Type inference

function sigmoid(x) temp = exp(-x) return (1 / (1+temp)) end

slide-9
SLIDE 9

Designed for performance

Type inference

function sigmoid(x::Int) temp = exp(-x)::Float64 return (1 / (1+temp))::Float64 end

slide-10
SLIDE 10

Designed for performance

Type inference

function sigmoid(x::Float32) temp = exp(-x)::Float32 return (1 / (1+temp))::Float32 end

Machine-native types

slide-11
SLIDE 11

Designed for performance

Multiple dispatch Type inference Machine-native types Specializing JIT compiler High-quality stand-alone machine code

slide-12
SLIDE 12

Extensible language

Source AST Julia IR LLVM IR Machine code Inspect & Inject Configure Source

slide-13
SLIDE 13

How does it look?

function vadd(a, b, c) i = threadIdx().x c[i] = a[i] + b[i] return end a = CuArray(randn(2,2)) b = CuArray(randn(2,2)) c = similar(a) @cuda threads=4 vadd(a,b,c) No DSL, no subset, just Julia CUDA abstraction level Performance parity

slide-14
SLIDE 14

How does it run?

slide-15
SLIDE 15

How does it work?

function vadd(a, b, c) i = threadIdx().x c[i] = a[i] + b[i] return end a = CuArray(randn(2,2)) b = CuArray(randn(2,2)) c = similar(a) @cuda threads=4 vadd(a,b,c)

vadd vadd CuArray{Float64,2} LLVM IR PTX

slide-16
SLIDE 16

How does it work?

vadd LLVM IR PTX

Run time JIT compiler Fully transparent No overhead!

vadd CuArray{Float64,2} LLVM IR PTX vadd CuArray{Int32,3}

slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

High-level GPU programming

( (a .+ b) ./ d ) .- e Great performance Clean & concise Generic code

slide-22
SLIDE 22
slide-23
SLIDE 23

function vadd(a, b, c) i = threadIdx().x c[i] = a[i] + b[i] return end W = randn(2, 10) b = randn(2) f(x) = softmax(W * x .+ b) model = Chain( Dense(10, 5, σ), Dense(5, 2), softmax) From GPU kernels, to differentiable algorithms, to high-level layer stacking. All on one platform.

slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26

Pkg.add(“Flux”)

slide-27
SLIDE 27

Differential Equations Machine Learning CUDA Automatic Differentiation

Everything You Build

The Julia Magic

Everything just works with everything else!

slide-28
SLIDE 28
slide-29
SLIDE 29

All the HPC Tooling

Generic programming is extremely powerful. StructOfArrays.jl DistributedArrays.jl JuliaDB.jl & DataFrames.jl Deep Learning Differential Equations Operations Research

slide-30
SLIDE 30
slide-31
SLIDE 31

function model(tree) if isleaf(tree) tree.value else model(tree.left) + model(tree.right) end

slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34

Case Studies

slide-35
SLIDE 35

JuliaCon – 300 Attendees, 150 Talks

slide-36
SLIDE 36

https://github.com/JuliaGPU/ NVIDIA Parallel Forall blog https://github.com/FluxML/

Julia