A Compiler Intermediate Representation for Stencils Climate change - - PowerPoint PPT Presentation

a compiler intermediate representation for stencils
SMART_READER_LITE
LIVE PREVIEW

A Compiler Intermediate Representation for Stencils Climate change - - PowerPoint PPT Presentation

spcl.inf.ethz.ch @spcl_eth J EAN -M ICHEL G ORIUS , T OBIAS W ICKY , T OBIAS G ROSSER , AND T OBIAS G YSI A Compiler Intermediate Representation for Stencils Climate change is now affecting every country on every continent. It is disrupting


slide-1
SLIDE 1

spcl.inf.ethz.ch @spcl_eth

A Compiler Intermediate Representation for Stencils

JEAN-MICHEL GORIUS, TOBIAS WICKY, TOBIAS GROSSER, AND TOBIAS GYSI

“Climate change is now affecting every country on every

  • continent. It is disrupting national economies and affecting lives,

costing people, communities and countries dearly today and even more tomorrow. Weather patterns are changing, sea levels are rising, weather events are becoming more extreme and greenhouse gas emissions are now at their highest levels in history.” - United Nations, Sustainable Development Goals

slide-2
SLIDE 2

spcl.inf.ethz.ch @spcl_eth

Open Climate Compiler Initiative

2

slide-3
SLIDE 3

spcl.inf.ethz.ch @spcl_eth

COSMO Atmospheric Model

  • Regional atmospheric model used by 7 national weather services
  • Implements many different stencil programs
slide-4
SLIDE 4

spcl.inf.ethz.ch @spcl_eth

What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

Resolution (35m)

slide-5
SLIDE 5

spcl.inf.ethz.ch @spcl_eth

What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

Resolution (35m)

slide-6
SLIDE 6

spcl.inf.ethz.ch @spcl_eth

What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

Resolution (70m)

slide-7
SLIDE 7

spcl.inf.ethz.ch @spcl_eth

What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

Resolution (140m)

slide-8
SLIDE 8

spcl.inf.ethz.ch @spcl_eth

What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

Resolution (280m)

slide-9
SLIDE 9

spcl.inf.ethz.ch @spcl_eth

What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

Resolution (560m)

slide-10
SLIDE 10

spcl.inf.ethz.ch @spcl_eth

What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

Resolution (1.1km – Weather Forecast Today)

slide-11
SLIDE 11

spcl.inf.ethz.ch @spcl_eth

What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

Resolution (2.2km – Weather Forecast 2015)

slide-12
SLIDE 12

spcl.inf.ethz.ch @spcl_eth

Achieving High-Performance, Portability, and Productivity

12

1998 COSMO 2010 Stella/GridTools

  • Fortran code
  • ptimized for

vector machines

  • DSL embedded in C++
  • GPU and CPU support
  • performance &

portability 2015 1st GPU model running in production 2017 Dawn (GTClang)

  • domain-specific compiler
  • front end language agnostic
  • powerful analysis and
  • ptimization passes
  • productivity
slide-13
SLIDE 13

spcl.inf.ethz.ch @spcl_eth

Domain-Science vs Computer-Science

in lap lap(i,j) = -4.0 * in(i,j) + in(i-1,j) + in(i+1,j) + in(i,j-1) + in(i,j+1)

  • element-wise computation
  • fixed neighborhood
  • solve PDE
  • finite differences
  • structured grid
slide-14
SLIDE 14

spcl.inf.ethz.ch @spcl_eth

Algorithmic Motifs – Finite Differences

14

i j i j

  • stencils (no loop carried dependencies)
  • mostly horizontal dependencies
slide-15
SLIDE 15

spcl.inf.ethz.ch @spcl_eth

Algorithmic Motifs – Tridiagonal Systems

15

  • vertical dependencies
  • loop carried dependencies

k k

slide-16
SLIDE 16

spcl.inf.ethz.ch @spcl_eth

16

Architecture of the Dawn Compiler

DSL Code High-Level IR Front End

Front End

stencil average { storage in, out; Do { vertical_region(kstart, kend) {

  • ut[i,j,k] = 0.5 * (in[i-1,j,k] + in[i+1,j,k])

} } };

slide-17
SLIDE 17

spcl.inf.ethz.ch @spcl_eth

17

Architecture of the Dawn Compiler

DSL Code High-Level IR Low-Level IR Front End Parallelizer

Parallelizer

  • add synchronization
  • solve data races
  • safety checks
slide-18
SLIDE 18

spcl.inf.ethz.ch @spcl_eth

18

Architecture of the Dawn Compiler

DSL Code High-Level IR Low-Level IR Low-Level IR Front End Parallelizer Optimizer

Optimizer

  • data-locality
  • caching
  • memory footprint
slide-19
SLIDE 19

spcl.inf.ethz.ch @spcl_eth

19

Architecture of the Dawn Compiler

DSL Code High-Level IR Low-Level IR Low-Level IR Optimized Code Front End Parallelizer Optimizer Code Generator

Code Generator

  • CUDA
  • GridTools
  • Debug
slide-20
SLIDE 20

spcl.inf.ethz.ch @spcl_eth

20

Stella

gtclang DSL

Gridtools

Snowflake Exastencil

Lift

MSL Nebo-Wasatch

Simflowny DSL

Stincilla

OpenSBLI

PATUS

Halide

Multi-Stencil

Hipacc Physis PADS SDSL

AMRStencil Liszt NUMA

DACE Firedrake

slide-21
SLIDE 21

spcl.inf.ethz.ch @spcl_eth

21

slide-22
SLIDE 22

spcl.inf.ethz.ch @spcl_eth

Climate Stencil Compilation with MLIR

22

MLIR Stencil Stencil Affine Std Ops GPU DSL frontend NVVM / ROCDL

slide-23
SLIDE 23

spcl.inf.ethz.ch @spcl_eth

GPU Execution Model and Optimizations

23

k i j

sequential loop vectorization loop shifting & fusion sliding window in registers loop tiling stencil inlining

  • verlapped tiling

shared memory threads (grouped in blocks)

slide-24
SLIDE 24

spcl.inf.ethz.ch @spcl_eth

Stencil Inlining

24

for(int i = IB; i < IE; i++) tmp[i] = in[i] + in[i+1]; for(int i = IB; i < IE; i++)

  • ut[i] = tmp[i] + tmp[i-1];

global memory register i

in[i] in[i+1] tmp[i]

i

tmp[i-1] tmp[i]

  • ut[i]
slide-25
SLIDE 25

spcl.inf.ethz.ch @spcl_eth

Stencil Inlining

25

for(int i = IB; i < IE; i++) tmp[i] = in[i] + in[i+1]; for(int i = IB; i < IE; i++)

  • ut[i] = tmp[i] + tmp[i-1];

for(int i = IB; i < IE; i++)

  • ut[i] =

(in[i] + in[i+1]) + (in[i-1] + in[i]);

global memory register i

  • ut[i]

in[i-1] in[i] in[i+1]

slide-26
SLIDE 26

spcl.inf.ethz.ch @spcl_eth

26

stencil_function laplacian { storage phi; Do { return phi(i + 1) + phi(i - 1) + phi(j + 1) + phi(j - 1) - 4.0 * phi; } }; stencil hori_diff_stencil { storage u, out; var lap; Do { vertical_region(k_start, k_end) { lap = laplacian(u);

  • ut = laplacian(lap);

} } };

slide-27
SLIDE 27

spcl.inf.ethz.ch @spcl_eth

27

func @laplacian(%arg0: !stencil<"field:f64">) -> f64 attributes {stencil.function} { %0 = stencil.constant_offset 1 0 0 %1 = stencil.read(%arg0, %0) : f64 // ... %cst = constant 4.000000e+00 : f64 %11 = stencil.constant_offset 0 0 0 %12 = stencil.read(%arg0, %11) : f64 %13 = stencil.mul(%cst, %12) : f64 %14 = stencil.sub(%10, %13) : f64 return %14 : f64 } func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { %0 = stencil.temp : !stencil<"field:f64"> %1 = stencil.context "kstart" : index %2 = stencil.context "kend" : index stencil.vertical_region(%1, %2) { // ... %6 = stencil.lambda @laplacian(%0) : (!stencil<"field:f64">) -> f64 %7 = stencil.constant_offset 0 0 0 %8 = stencil.read(%6, %7) : f64 stencil.write(%arg1, %8) : f64 } return }

slide-28
SLIDE 28

spcl.inf.ethz.ch @spcl_eth

28

func @laplacian(%arg0: !stencil<"field:f64">) -> f64 attributes {stencil.function} { %0 = stencil.constant_offset 1 0 0 %1 = stencil.read(%arg0, %0) : f64 // ... %cst = constant 4.000000e+00 : f64 %11 = stencil.constant_offset 0 0 0 %12 = stencil.read(%arg0, %11) : f64 %13 = stencil.mul(%cst, %12) : f64 %14 = stencil.sub(%10, %13) : f64 return %14 : f64 } func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { %0 = stencil.temp : !stencil<"field:f64"> %1 = stencil.context "kstart" : index %2 = stencil.context "kend" : index stencil.vertical_region(%1, %2) { // ... %6 = stencil.lambda @laplacian(%0) : (!stencil<"field:f64">) -> f64 %7 = stencil.constant_offset 0 0 0 %8 = stencil.read(%6, %7) : f64 stencil.write(%arg1, %8) : f64 } return }

slide-29
SLIDE 29

spcl.inf.ethz.ch @spcl_eth

29

func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { // ... stencil.vertical_region(%1, %2) { // ... %22 = stencil.constant_offset 1 0 0 %23 = stencil.read(%2, %22) : f64 %24 = stencil.constant_offset -1 0 0 %25 = stencil.read(%2, %24) : f64 %26 = stencil.add(%23, %25) : f64 // ... %cst_0 = constant 4.000000e+00 : f64 %33 = stencil.constant_offset 0 0 0 %34 = stencil.read(%2, %33) : f64 %35 = stencil.mul(%cst_0, %34) : f64 %36 = stencil.sub(%32, %35) : f64 stencil.write(%0, %36) : f64 %37 = stencil.constant_offset 0 0 0 %38 = stencil.read(%0, %37) : f64 stencil.write(%arg1, %38) : f64 // ... } return }

slide-30
SLIDE 30

spcl.inf.ethz.ch @spcl_eth

30

func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { // ... stencil.vertical_region(%1, %2) { // ... %22 = stencil.constant_offset 1 0 0 %23 = stencil.read(%2, %22) : f64 %24 = stencil.constant_offset -1 0 0 %25 = stencil.read(%2, %24) : f64 %26 = stencil.add(%23, %25) : f64 // ... %cst_0 = constant 4.000000e+00 : f64 %33 = stencil.constant_offset 0 0 0 %34 = stencil.read(%2, %33) : f64 %35 = stencil.mul(%cst_0, %34) : f64 %36 = stencil.sub(%32, %35) : f64 stencil.write(%0, %36) : f64 %37 = stencil.constant_offset 0 0 0 %38 = stencil.read(%0, %37) : f64 stencil.write(%arg1, %38) : f64 // ... } return }

slide-31
SLIDE 31

spcl.inf.ethz.ch @spcl_eth

31

func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { // ... stencil.vertical_region(%1, %2) { // ... %22 = stencil.constant_offset 1 0 0 %23 = stencil.read(%2, %22) : f64 %24 = stencil.constant_offset -1 0 0 %25 = stencil.read(%2, %24) : f64 %26 = stencil.add(%23, %25) : f64 // ... %cst_0 = constant 4.000000e+00 : f64 %33 = stencil.constant_offset 0 0 0 %34 = stencil.read(%2, %33) : f64 %35 = stencil.mul(%cst_0, %34) : f64 %36 = stencil.sub(%32, %35) : f64 stencil.write(%0, %36) : f64 %37 = stencil.constant_offset 0 0 0 %38 = stencil.read(%0, %37) : f64 stencil.write(%arg1, %38) : f64 // ... } return }

slide-32
SLIDE 32

spcl.inf.ethz.ch @spcl_eth

32

func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { // ... stencil.vertical_region(%1, %2) { // ... %22 = stencil.constant_offset 1 0 0 %23 = stencil.read(%2, %22) : f64 %24 = stencil.constant_offset -1 0 0 %25 = stencil.read(%2, %24) : f64 %26 = stencil.add(%23, %25) : f64 // ... %cst_0 = constant 4.000000e+00 : f64 %33 = stencil.constant_offset 0 0 0 %34 = stencil.read(%2, %33) : f64 %35 = stencil.mul(%cst_0, %34) : f64 %36 = stencil.sub(%32, %35) : f64 stencil.write(%0, %36) : f64 %37 = stencil.constant_offset 0 0 0 %38 = stencil.read(%0, %37) : f64 stencil.write(%arg1, %38) : f64 // ... } return }

slide-33
SLIDE 33

spcl.inf.ethz.ch @spcl_eth

33

func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { // ... stencil.vertical_region(%3, %4) { %22 = stencil.constant_offset 1 0 0 %23 = stencil.read(%2, %22) : f64 %24 = stencil.constant_offset -1 0 0 %25 = stencil.read(%2, %24) : f64 %26 = stencil.add(%23, %25) : f64 // ... %cst_0 = constant 4.000000e+00 : f64 %33 = stencil.constant_offset 0 0 0 %34 = stencil.read(%2, %33) : f64 %35 = stencil.mul(%cst_0, %34) : f64 %36 = stencil.sub(%32, %35) : f64 stencil.write(%0, %36) : f64 } stencil.vertical_region(%3, %4) { %37 = stencil.constant_offset 0 0 0 %38 = stencil.read(%0, %37) : f64 stencil.write(%arg1, %38) : f64 } return }

slide-34
SLIDE 34

spcl.inf.ethz.ch @spcl_eth

34

func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { // ... stencil.vertical_region(%3, %4) { %22 = stencil.constant_offset 1 0 0 %23 = stencil.read(%2, %22) : f64 %24 = stencil.constant_offset -1 0 0 %25 = stencil.read(%2, %24) : f64 %26 = stencil.add(%23, %25) : f64 // ... %cst_0 = constant 4.000000e+00 : f64 %33 = stencil.constant_offset 0 0 0 %34 = stencil.read(%2, %33) : f64 %35 = stencil.mul(%cst_0, %34) : f64 %36 = stencil.sub(%32, %35) : f64 stencil.write(%0, %36) : f64 } stencil.vertical_region(%3, %4) { %37 = stencil.constant_offset 0 0 0 %38 = stencil.read(%0, %37) : f64 stencil.write(%arg1, %38) : f64 } return }

slide-35
SLIDE 35

spcl.inf.ethz.ch @spcl_eth

35

func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { // ... %49 = stencil.context "istart" : index %50 = stencil.context "iend" : index %51 = stencil.context "jstart" : index %52 = stencil.context "jend" : index affine.for %i9 = #map2(%3) to #map3(%4) { stencil.induction_var "K" %i9 : index affine.for %i10 = #map2(%49) to #map3(%50) { stencil.induction_var "I" %i10 : index affine.for %i11 = #map2(%51) to #map3(%52) { stencil.induction_var "J" %i11 : index %53 = stencil.constant_offset 0 0 0 %54 = stencil.read(%0, %53) : f64 stencil.write(%arg1, %54) : f64 } } } return }

slide-36
SLIDE 36

spcl.inf.ethz.ch @spcl_eth

36

func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { // ... %49 = stencil.context "istart" : index %50 = stencil.context "iend" : index %51 = stencil.context "jstart" : index %52 = stencil.context "jend" : index affine.for %i9 = #map2(%3) to #map3(%4) { stencil.induction_var "K" %i9 : index affine.for %i10 = #map2(%49) to #map3(%50) { stencil.induction_var "I" %i10 : index affine.for %i11 = #map2(%51) to #map3(%52) { stencil.induction_var "J" %i11 : index %53 = stencil.constant_offset 0 0 0 %54 = stencil.read(%0, %53) : f64 stencil.write(%arg1, %54) : f64 } } } return }

slide-37
SLIDE 37

spcl.inf.ethz.ch @spcl_eth

37

func @hori_diff_stencil(%arg0: !C.voidptr, %arg1: !C.voidptr) { // ... %62 = call @istart() : () -> index %63 = call @iend() : () -> index %64 = call @jstart() : () -> index %65 = call @jend() : () -> index %c1_11 = constant 1 : index %66 = C.addi(%4, %c1_11) : index C.for (%i9 = %3 to %66) { stencil.induction_var "K" %i9 : index %c1_12 = constant 1 : index %67 = C.addi(%63, %c1_12) : index C.for (%i10 = %62 to %67) { stencil.induction_var "I" %i10 : index %c1_13 = constant 1 : index %68 = C.addi(%65, %c1_13) : index C.for (%i11 = %64 to %68) { stencil.induction_var "J" %i11 : index %69 = stencil.constant_offset 0 0 0 %70 = call @readTemp(%0, %i10, %i11, %i9, %69) : // ... call @write(%arg1, %70, %i10, %i11, %i9) : // ... }}} return }

slide-38
SLIDE 38

spcl.inf.ethz.ch @spcl_eth

38

// gridtools boilerplate void hori_diff_stencil(void *v_0, void *v_1) { // ... int32_t v_69 = istart(); int32_t v_70 = iend(); int32_t v_71 = jstart(); int32_t v_72 = jend(); for (int32_t i_0 = v_5; i_0 < v_12; i_0++) { int32_t v_73 = v_70 + v_11; for (int32_t i_1 = v_69; i_1 < v_73; i_1++) { int32_t v_74 = v_72 + v_11; for (int32_t i_2 = v_71; i_2 < v_74; i_2++) { int32_t v_75[] = {0, 0, 0}; double v_76 = readTemp(v_2, i_1, i_2, i_0, v_75); write(v_1, v_76, i_1, i_2, i_0); } } } return }

slide-39
SLIDE 39

spcl.inf.ethz.ch @spcl_eth

Low-level Dialect

39

stencil.iir { stencil.stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { stencil.multi_stage "Parallel" { stencil.stage { stencil.do_method [0, 0, 60, 0] { %0 = stencil.field_access %arg1 [0, 0, 0] : !stencil<"ptr:f64"> %1 = stencil.field_access %arg0 [0, 0, 0] : !stencil<"ptr:f64"> %2 = stencil.get_value %0 : f64 %3 = stencil.get_value %1 : f64 %4 = addf %2, %3 : f64 %cst = constant 4.000000e+00 : f64 %5 = mulf %4, %cst stencil.write %0, %5 : f64 } } } } }

slide-40
SLIDE 40

spcl.inf.ethz.ch @spcl_eth

GPU Dialect Extensions

40

SUPPORT FOR WARP-LEVEL PRIMITIVES, SUCH AS SHUFFLING ACCESS TO SHARED MEMORY SUPPORT FOR PARALLEL KERNEL EXECUTION INTER-NODE COMMUNICATION FOR DISTRIBUTED GPU APPLICATIONS

slide-41
SLIDE 41

spcl.inf.ethz.ch @spcl_eth

Conclusion

41

slide-42
SLIDE 42

spcl.inf.ethz.ch @spcl_eth

42