should i port my code to a dsl
play

Should I port my code to a DSL? Bahareh Davani Ferran Marti Laleh - PowerPoint PPT Presentation

Should I port my code to a DSL? Bahareh Davani Ferran Marti Laleh Beni Saikiran Ramanan Feng Liu Aparna Chandramowlishwaran October 27, 2017 Scholas Dagstuhl actory PC https://en.wikipedia.org/wiki/Newport_Beach,_California C


  1. Should I port my code to a DSL? Bahareh Davani · Ferran Marti · Laleh Beni · Saikiran Ramanan · Feng Liu Aparna Chandramowlishwaran October 27, 2017 — Scholas Dagstuhl actory PC

  2. https://en.wikipedia.org/wiki/Newport_Beach,_California

  3. C ONTEXT : H I P ER (“H IGH PERFORMANCE TURBULENT FLOW SIMULATIONS ”)

  4. C ONTEXT : M O B O Citation : “Petascale direct numerical simulation of blood flow on 200k cores and (“M OVING B OUNDARIES ”) heterogeneous architectures.” In SC’10. 
 Winner, Gordon Bell Prize . http://dx.doi.org/10.1109/SC.2010.42

  5. D EFORMABLE R ED B LOOD C ELLS Prior work with same physical fidelity 1,200 cells : Sequential + integral equations 
 Zinchenko et al. (2003) 14,000 cells : IBM BG/P + Lattice Boltzmann 
 O(10k) unknowns/cell 
 Clausen et al. (2010) MoBo: 260 million cells ( 90 billion unknowns ) on 200k cores (Jaguar @ ORNL) CPU, GPU + integral equations + implicit AMR 
 O(100) unknowns / cell Key to scaling: Optimal n-body methods based on the 
 fast multipole method (FMM) on highly non-uniform domains

  6. W HY N- BODY METHODS ? • One of the original seven dwarfs or motifs • FMM listed among the top 10 algorithms having the greatest influence in 20 th century • EM is one of the top 10 algorithms having the highest impact in 
 data mining • Applications • Machine learning • Computer vision • Computational geometry • Scientific computing …

  7. T UNNEL V ISION ? Do current frameworks “Everyone is doing capture stencil patterns in stencils.” “real applications” ? Anonymous Wolverine. What is the gap between “Stencils are easy, they stencil DSLs and hand- are structured” optimized code for “real Anonymous Chipmunk. applications” ? “We need separation of What is the right concerns” (drink!) separation of concerns? Anonymous Chupacabras. “We need better Story time! performance models” Anonymous Axolotl.

  8. T UNNEL V ISION ? Do current frameworks “Everyone is doing capture stencil patterns in stencils.” “real applications” ? Anonymous Wolverine. What is the gap between “Stencils are easy, they stencil DSLs and hand- are structured” optimized code for “real Anonymous Chipmunk. applications” ? “We need separation of What is the right concerns” (drink!) separation of concerns? Anonymous Chupacabras. “We need better Story time! performance models” Anonymous Axolotl.

  9. Computational fluid dynamics simulations

  10. G OVERNING EQUATIONS ๏ 3D Unsteady Reynolds Averaged Navier-Stokes (URANS) equations ๏ Dual time-stepping scheme ๏ Pseudo-time marching — multi-stage Runge-Kutta scheme ๏ Marched to a steady state in pseudo time ๏ Spatial discretization of the residual ๏ 2nd order accurate

  11. S TENCIL P ATTERNS ๏ Cell-centered stencils ๏ Most well-studied in literature ๏ Vertex-centered stencils ๏ More complex memory access pattern ๏ More memory-bound than cell-centered stencils

  12. S TENCIL P ATTERNS ๏ Cell-centered stencils ๏ Most well-studied in literature ๏ Vertex-centered stencils ๏ More complex memory access pattern ๏ More memory-bound than cell-centered stencils

  13. Speedup 16 32 1 2 4 8 S INGLE - AND M ULTI - CORE O PTIMIZATIONS 1 +Strength Reduction Number of threads 2 ~105x Haswell 4 (Cylinder flow with 2 million cells) 8 NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region 16 SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region +Fusion 32 Speedup 16 32 64 1 2 4 8 +Parallelism 1 2 Number of threads 4 Abu Dhabi ~159x +NUMA 8 16 NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region 32 +Blocking 64 Speedup 16 32 64 1 2 4 8 +SIMD Transformations 1 2 Number of threads 4 ~160x Broadwell 8 16 22 +SIMD NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region NUMA Region 44 SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region SMT Region 88

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend