a compiler intermediate representation for stencils
play

A Compiler Intermediate Representation for Stencils Climate change - PowerPoint PPT Presentation

spcl.inf.ethz.ch @spcl_eth J EAN -M ICHEL G ORIUS , T OBIAS W ICKY , T OBIAS G ROSSER , AND T OBIAS G YSI A Compiler Intermediate Representation for Stencils Climate change is now affecting every country on every continent. It is disrupting


  1. spcl.inf.ethz.ch @spcl_eth J EAN -M ICHEL G ORIUS , T OBIAS W ICKY , T OBIAS G ROSSER , AND T OBIAS G YSI A Compiler Intermediate Representation for Stencils “Climate change is now affecting every country on every continent. It is disrupting national economies and affecting lives, costing people, communities and countries dearly today and even more tomorrow. Weather patterns are changing, sea levels are rising, weather events are becoming more extreme and greenhouse gas emissions are now at their highest levels in history.” - United Nations, Sustainable Development Goals

  2. spcl.inf.ethz.ch @spcl_eth Open Climate Compiler Initiative 2

  3. spcl.inf.ethz.ch @spcl_eth COSMO Atmospheric Model • Regional atmospheric model used by 7 national weather services • Implements many different stencil programs

  4. spcl.inf.ethz.ch @spcl_eth Resolution (35m) What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

  5. spcl.inf.ethz.ch @spcl_eth Resolution (35m) What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

  6. spcl.inf.ethz.ch @spcl_eth Resolution (70m) What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

  7. spcl.inf.ethz.ch @spcl_eth Resolution (140m) What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

  8. spcl.inf.ethz.ch @spcl_eth Resolution (280m) What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

  9. spcl.inf.ethz.ch @spcl_eth Resolution (560m) What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

  10. spcl.inf.ethz.ch @spcl_eth Resolution (1.1km – Weather Forecast Today) What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

  11. spcl.inf.ethz.ch @spcl_eth Resolution (2.2km – Weather Forecast 2015) What resolution is needed to predict if there is snow out of the banner cloud at Matterhorn?

  12. spcl.inf.ethz.ch @spcl_eth Achieving High-Performance, Portability, and Productivity 1st GPU model running in Dawn COSMO production (GTClang) Stella/GridTools 1998 2010 2015 2017 Fortran code DSL embedded in C++ domain-specific compiler • • • optimized for GPU and CPU support front end language agnostic • • • vector machines performance & powerful analysis and • • portability optimization passes productivity • 12

  13. spcl.inf.ethz.ch @spcl_eth Domain-Science vs Computer-Science • element-wise computation • solve PDE • fixed neighborhood • finite differences • structured grid lap(i,j) = -4.0 * in(i,j) + in(i-1,j) + in(i+1,j) + in(i,j-1) + in(i,j+1) lap in

  14. spcl.inf.ethz.ch @spcl_eth Algorithmic Motifs – Finite Differences • stencils (no loop carried dependencies) • mostly horizontal dependencies j j i i 14

  15. spcl.inf.ethz.ch @spcl_eth Algorithmic Motifs – Tridiagonal Systems • vertical dependencies • loop carried dependencies k k 15

  16. spcl.inf.ethz.ch @spcl_eth Architecture of the Dawn Compiler DSL Code Front End Front End stencil average { High-Level IR storage in, out; Do { vertical_region(kstart, kend) { out[i,j,k] = 0.5 * (in[i-1,j,k] + in[i+1,j,k]) } } }; 16

  17. spcl.inf.ethz.ch @spcl_eth Architecture of the Dawn Compiler DSL Code Front End High-Level IR Parallelizer • add synchronization Parallelizer • solve data races Low-Level IR • safety checks 17

  18. spcl.inf.ethz.ch @spcl_eth Architecture of the Dawn Compiler DSL Code Front End High-Level IR Optimizer • data-locality Parallelizer • caching Low-Level IR • memory footprint Optimizer Low-Level IR 18

  19. spcl.inf.ethz.ch @spcl_eth Architecture of the Dawn Compiler DSL Code Front End High-Level IR Code Generator • CUDA Parallelizer • GridTools Low-Level IR • Debug Optimizer Low-Level IR Code Generator Optimized Code 19

  20. spcl.inf.ethz.ch @spcl_eth SDSL DACE PATUS Firedrake Simflowny DSL Halide Nebo-Wasatch Snowflake gtclang DSL Liszt Lift Physis MSL AMRStencil Gridtools Stella NUMA Exastencil Hipacc OpenSBLI Stincilla Multi-Stencil PADS 20

  21. spcl.inf.ethz.ch @spcl_eth 21

  22. spcl.inf.ethz.ch @spcl_eth Climate Stencil Compilation with MLIR MLIR DSL frontend Stencil Stencil Affine Std Ops GPU NVVM / ROCDL 22

  23. spcl.inf.ethz.ch @spcl_eth GPU Execution Model and Optimizations loop shifting & fusion sliding window in registers overlapped tiling shared memory k j sequential loop loop tiling i stencil inlining vectorization threads (grouped in blocks) 23

  24. spcl.inf.ethz.ch @spcl_eth Stencil Inlining in[i] in[i+1] tmp[i] i for ( int i = IB; i < IE; i++) tmp[i] = in[i] + in[i+1]; tmp[i-1] tmp[i] for ( int i = IB; i < IE; i++) out[i] = tmp[i] + tmp[i-1]; out[i] i register global memory 24

  25. spcl.inf.ethz.ch @spcl_eth Stencil Inlining in[i-1] in[i] in[i+1] for ( int i = IB; i < IE; i++) for ( int i = IB; i < IE; i++) tmp[i] = in[i] + in[i+1]; out[i] = (in[i] + in[i+1]) + for ( int i = IB; i < IE; i++) (in[i-1] + in[i]); out[i] = tmp[i] + tmp[i-1]; out[i] i register global memory 25

  26. spcl.inf.ethz.ch @spcl_eth stencil_function laplacian { storage phi; Do { return phi(i + 1) + phi(i - 1) + phi(j + 1) + phi(j - 1) - 4.0 * phi; } }; stencil hori_diff_stencil { storage u, out; var lap; Do { vertical_region(k_start, k_end) { lap = laplacian(u); out = laplacian(lap); } } }; 26

  27. spcl.inf.ethz.ch @spcl_eth func @laplacian(%arg0: !stencil<"field:f64">) -> f64 attributes {stencil.function} { %0 = stencil.constant_offset 1 0 0 %1 = stencil.read(%arg0, %0) : f64 // ... %cst = constant 4.000000e+00 : f64 %11 = stencil.constant_offset 0 0 0 %12 = stencil.read(%arg0, %11) : f64 %13 = stencil.mul(%cst, %12) : f64 %14 = stencil.sub(%10, %13) : f64 return %14 : f64 } func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { %0 = stencil.temp : !stencil<"field:f64"> %1 = stencil.context "kstart" : index %2 = stencil.context "kend" : index stencil.vertical_region(%1, %2) { // ... %6 = stencil.lambda @laplacian(%0) : (!stencil<"field:f64">) -> f64 %7 = stencil.constant_offset 0 0 0 %8 = stencil.read(%6, %7) : f64 stencil.write(%arg1, %8) : f64 } return } 27

  28. spcl.inf.ethz.ch @spcl_eth func @laplacian(%arg0: !stencil<"field:f64">) -> f64 attributes {stencil.function} { %0 = stencil.constant_offset 1 0 0 %1 = stencil.read(%arg0, %0) : f64 // ... %cst = constant 4.000000e+00 : f64 %11 = stencil.constant_offset 0 0 0 %12 = stencil.read(%arg0, %11) : f64 %13 = stencil.mul(%cst, %12) : f64 %14 = stencil.sub(%10, %13) : f64 return %14 : f64 } func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { %0 = stencil.temp : !stencil<"field:f64"> %1 = stencil.context "kstart" : index %2 = stencil.context "kend" : index stencil.vertical_region(%1, %2) { // ... %6 = stencil.lambda @laplacian(%0) : (!stencil<"field:f64">) -> f64 %7 = stencil.constant_offset 0 0 0 %8 = stencil.read(%6, %7) : f64 stencil.write(%arg1, %8) : f64 } return } 28

  29. spcl.inf.ethz.ch @spcl_eth func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { // ... stencil.vertical_region(%1, %2) { // ... %22 = stencil.constant_offset 1 0 0 %23 = stencil.read(%2, %22) : f64 %24 = stencil.constant_offset -1 0 0 %25 = stencil.read(%2, %24) : f64 %26 = stencil.add(%23, %25) : f64 // ... %cst_0 = constant 4.000000e+00 : f64 %33 = stencil.constant_offset 0 0 0 %34 = stencil.read(%2, %33) : f64 %35 = stencil.mul(%cst_0, %34) : f64 %36 = stencil.sub(%32, %35) : f64 stencil.write(%0, %36) : f64 %37 = stencil.constant_offset 0 0 0 %38 = stencil.read(%0, %37) : f64 stencil.write(%arg1, %38) : f64 // ... } return } 29

  30. spcl.inf.ethz.ch @spcl_eth func @hori_diff_stencil(%arg0: !stencil<"field:f64">, %arg1: !stencil<"field:f64">) { // ... stencil.vertical_region(%1, %2) { // ... %22 = stencil.constant_offset 1 0 0 %23 = stencil.read(%2, %22) : f64 %24 = stencil.constant_offset -1 0 0 %25 = stencil.read(%2, %24) : f64 %26 = stencil.add(%23, %25) : f64 // ... %cst_0 = constant 4.000000e+00 : f64 %33 = stencil.constant_offset 0 0 0 %34 = stencil.read(%2, %33) : f64 %35 = stencil.mul(%cst_0, %34) : f64 %36 = stencil.sub(%32, %35) : f64 stencil.write(%0, %36) : f64 %37 = stencil.constant_offset 0 0 0 %38 = stencil.read(%0, %37) : f64 stencil.write(%arg1, %38) : f64 // ... } return } 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend