gpu acceleration on the 3d elastic rtm method
play

GPU Acceleration on the 3D Elastic RTM Method Lin Gan, Tsinghua - PowerPoint PPT Presentation

High Performance Geo-Computing Group GPU Acceleration on the 3D Elastic RTM Method Lin Gan, Tsinghua University May 8 st , 2017, GTC 2017 About Tsinghua HPGC High Performance Geo-Computing Group Interdisciplinary research group High


  1. High Performance Geo-Computing Group GPU Acceleration on the 3D Elastic RTM Method Lin Gan, Tsinghua University May 8 st , 2017, GTC 2017

  2. About Tsinghua HPGC • High Performance Geo-Computing Group – Interdisciplinary research group – High performance, high resolution geo-science acceleration GPU Acceleration on Elastic RTM

  3. About Tsinghua HPGC • High Performance Geo-Computing Group – Interdisciplinary research group – High performance, high resolution geo-science acceleration data computing Climate changing Seismic modeling High Performance Computing GPU Acceleration on Elastic RTM

  4. About Tsinghua HPGC • High Performance Geo-Computing Group – Interdisciplinary research group – High performance, high resolution geo-science acceleration – The most advanced HPC platforms • Multi-core CPU, many-core GPU & MIC • Reconfigurable data flow engines – Maxeler DFEs, IBM OpenPower, Intel Xeon+FPGA • Supercomputer – Tianhe-1A: 7168 CPU-GPU nodes, 4.7PFlops Rpeak – Tianhe-2: 16,000 CPU-3MIC nodes, 54.9PFlops Rpeak – Tsinghua Explore100: 740 CPU nodes, 4TFlops Rpeak – Cooperation and Sponsorship GPU Acceleration on Elastic RTM

  5. About This Work • HPGC-SEP Summer Exchange Project – Advisor: Dr. Haohuan Fu , Dr. Robert Clapp, and Prof. Biondo Biondi – Special thanks to Gustavo Alves, and Ettore Biondi • Achievements on GPU – 10x speedup accelerating a 2D elastic RTM code over 24 CPU cores – Implementation of a 3D elastic RTM kernel with adjustable interfaces – 27x speedup accelerating the 3D RTM kernel over 24 CPU cores GPU Acceleration on Elastic RTM

  6. 3D Elastic RTM Stencils • State variables (data) and the attributes (model) Shear stresses Particle velocities Normal stresses 𝑤 " , 𝑤 # , 𝑤 $ , Data Data 𝜏 "" ,𝜏 ## , 𝜏 $$ 𝜏 "# ,𝜏 "$ , 𝜏 #$ Forward Adjoint Model Density Model Mu Lambda mass kg ρ = = Δ Δ Δ 3 x y z m Force ∂ P = Area λ = ρ µ = = GPa GPa ∂ ρ Δ x length GPU Acceleration on Elastic RTM

  7. 3D Elastic RTM Stencils • Forward and Adjoint t=0 t=Nt Data Data Forward Adjoint … … ∆𝑢 ∆𝑢 Model Model t=0 t=Nt Memory GPU Acceleration on Elastic RTM

  8. 3D Elastic RTM Stencils • Wave Equations ∂ ∂ ∂ ∂ 1 = σ + σ + σ + V ( , ) x t [ ( , ) x t ( , ) x t ( , ) x t S ( , )] x t x xx xy xz x ∂ ρ ∂ ∂ ∂ t ( ) x x y z ∂ ∂ ∂ ∂ 1 = σ + σ + σ + V ( , ) x t [ ( , ) x t ( , ) x t ( , ) x t S ( , )] x t ∂ y ρ ∂ xy ∂ yy ∂ yz y t ( ) x x y z ∂ ∂ ∂ ∂ 1 = σ + σ + σ + V ( , ) x t [ ( , ) x t ( , ) x t ( , ) x t S ( , )] x t z xz yz zz z ∂ ρ ∂ ∂ ∂ t ( ) x x y z ∂ ∂ ∂ ∂ σ = λ + µ + λ + + ( , ) x t [ ( ) x 2 ( )] x V ( , ) x t ( )[ x V ( , ) x t V ( , )] x t S ( , ) x t ∂ xx ∂ x ∂ y ∂ z xx t x y z ∂ ∂ ∂ ∂ σ = λ + µ ( , ) x t [ ( ) x 2 ( )] x V ( , ) x t + λ + + ( )[ x V ( , ) x t V ( , )] x t S ( , ) x t ∂ yy ∂ x x z yy t x ∂ ∂ x z ∂ ∂ ∂ ∂ + λ + + σ = λ + µ ( )[ x V ( , ) x t V ( , )] x t S ( , ) x t ( , ) x t [ ( ) x 2 ( )] x V ( , ) x t ∂ x ∂ y zz ∂ zz ∂ x x y t x ∂ ∂ ∂ σ = µ + + ( , ) x t ( )[ x V ( , ) x t V ( , )] x t S ( , ) x t ∂ xy ∂ y ∂ x xy t x y ∂ ∂ ∂ σ = µ + + ( , ) x t ( )[ x V ( , ) x t V ( , )] x t S ( , ) x t xz z x xz ∂ ∂ ∂ t x z ∂ ∂ ∂ σ = µ + + ( , ) x t ( )[ x V ( , ) x t V ( , )] x t S ( , ) x t ∂ yz ∂ z ∂ x yz t y z GPU Acceleration on Elastic RTM

  9. 3D Elastic RTM Stencils • For time: 2 nd ord. F.D. approximation Δ Δ t t Δ ∂ ∂ ∂ t + − t t = + σ + σ + σ + t t t t V ( ) x V ( ) x [ ( ) x ( ) x ( )] x S ( ) x 2 2 x x xx xy xz x ρ ∂ ∂ ∂ ( ) x x y z Δ Δ t t Δ ∂ ∂ ∂ t + − t t = + σ + σ + σ + Forward t t t t V ( ) x V ( ) x [ ( ) x ( ) x ( )] x S ( ) x 2 2 y y xy yy yz y ρ ∂ ∂ ∂ ( ) x x y z Δ Δ t t Δ ∂ ∂ ∂ t + − t t = + σ + σ + σ + t t t t V ( ) x V ( ) x [ ( ) x ( ) x ( )] x S ( ) x 2 2 z z ρ ∂ xz ∂ yz ∂ zz z ( ) x x y z Adjoint • Based on staggered grid • For space: 10 th ord. F.D. approximation 4 or 5 Stencil 5 or 4 GPU Acceleration on Elastic RTM

  10. 3D Elastic RTM Stencils • GPU Optimizations – K40 GPU, (200*200*200)*1000ts • Configuration of different blk sizes, reg. per blk • Best: blk ß 20*20; max reg. ß 56 • Variable data into L1/SM, Constant data into Read-only Cache – Dynamic Pointer Switch & Minimum Data Cubes • Only malloc data cubes covering three steps … … … 𝜏 #$ 𝜏 #$ 𝜏 #$ 𝑤 " 𝑤 " 𝑤 " t-1 t t+1 GPU Acceleration on Elastic RTM

  11. 3D Elastic RTM Stencils • GPU Optimizations – K40 GPU, (200*200*200)*1000ts • Configuration of different blk sizes, reg. per blk • Best: blk ß 20*20; max reg. ß 56 • Variable data into L1/SM, Constant data into Read-only Cache – Dynamic Pointer Switch & Minimum Data Cubes • Only malloc data cubes covering three steps … … … 𝜏 #$ 𝜏 #$ 𝜏 #$ 𝑤 " 𝑤 " 𝑤 " pre cur next GPU Acceleration on Elastic RTM

  12. 3D Elastic RTM Stencils • GPU Optimizations – K40 GPU, (200*200*200)*1000ts • Configuration of different blk sizes, reg. per blk • Best: blk ß 20*20; max reg. ß 56 • Variable data into L1/SM, Constant data into Read-only Cache – Dynamic Pointer Switch & Minimum Data Cubes • Only malloc data cubes covering three steps … … … 𝜏 #$ 𝜏 #$ 𝜏 #$ 𝑤 " 𝑤 " 𝑤 " cur next pre GPU Acceleration on Elastic RTM

  13. 3D Elastic RTM Stencils • GPU Optimizations – K40 GPU, (200*200*200)*1000ts • Configuration of different blk sizes, reg. per blk • Best: blk ß 20*20; max reg. ß 56 • Variable data into L1/SM, Constant data into Read-only Cache – Dynamic Pointer Switch & Minimum Data Cubes • Only malloc data cubes covering three steps … … … 𝜏 #$ 𝜏 #$ 𝜏 #$ 𝑤 " 𝑤 " 𝑤 " ∆𝑢 next pre cur Memory GPU Acceleration on Elastic RTM

  14. 3D Elastic RTM Stencils • GPU Optimizations x – Multiple GPUs y z 4 or 5 5 or 4 GPU Acceleration on Elastic RTM

  15. 3D Elastic RTM Stencils • GPU Optimizations x – Multiple GPUs y z 4 or 5 5 or 4 GPU Acceleration on Elastic RTM

  16. 3D Elastic RTM Stencils • GPU Optimizations x – Multiple GPUs y z halo 4 or 5 Internal 5 or 4 halo GPU Acceleration on Elastic RTM

  17. 3D Elastic RTM Stencils • GPU Optimizations – Multiple GPUs Internal GPU 0 GPU Algorithm per Stencil sweep halo For each subdomain ① Calculate RTM stencil ② Update Halo halo ③ Add Source ④ Switch Pointer GPU 1 Internal halo Stencil Computing Updating halo workflow GPU 2 Internal GPU Acceleration on Elastic RTM

  18. 3D Elastic RTM Stencils • GPU Optimizations – Multiple GPUs Internal GPU 0 GPU Algorithm per Stencil sweep halo For each subdomain ① Calculate halo RTM stencil ② Calculate Internal RTM stencil halo Update Halo ④ Add Source GPU 1 Internal ⑤ Switch Pointers halo Updating Halo Internal halo GPU 2 Overlapping workflow Internal GPU Acceleration on Elastic RTM

  19. 3D Elastic RTM Stencils • Validation and Performance – GPU Cluster in SEP • 4 K40 GPUs over 24 core CPU (OpenMP) • 200*200*200 + 1000 steps (record every 100 steps) Vx Vy GPU Acceleration on Elastic RTM

  20. 3D Elastic RTM Stencils • Validation and Performance – GPU Cluster in SEP • 4 K40 GPUs over 24 core CPU (OpenMP) • 200*200*200 + 1000 steps (record every 100 steps) Vx Vy GPU Acceleration on Elastic RTM

  21. 3D Elastic RTM Stencils • Validation and Performance – GPU Cluster in SEP • 4 K40 GPUs over 24 core CPU (OpenMP) • 200*200*200 + 1000 steps (record every 100 steps) Vx Vy GPU Acceleration on Elastic RTM

  22. 3D Elastic RTM Stencils • Validation and Performance – GPU Cluster in SEP • 4 K40 GPUs over 24 core CPU (OpenMP) • 200*200*200 + 1000 steps (record every 100 steps) Vx Vy GPU Acceleration on Elastic RTM

  23. 3D Elastic RTM Stencils • Validation and Performance – GPU Cluster in SEP • 4 K40 GPUs over 24 core CPU (OpenMP) • 200*200*200 + 1000 steps (record every 100 steps) Vx Vy GPU Acceleration on Elastic RTM

  24. 3D Elastic RTM Stencils • Validation and Performance – GPU Cluster in SEP • 4 K40 GPUs over 24 core CPU (OpenMP) • 200*200*200 + 1000 steps (record every 100 steps) Vx Vy GPU Acceleration on Elastic RTM

  25. 3D Elastic RTM Stencils • Validation and Performance – GPU Cluster in SEP • 4 K40 GPUs over 24 core CPU (OpenMP) • 200*200*200 + 1000 steps (record every 100 steps) Vx Vy GPU Acceleration on Elastic RTM

  26. 3D Elastic RTM Stencils • Validation and Performance – GPU Cluster in SEP • 4 K40 GPUs over 24 core CPU (OpenMP) • 200*200*200 + 1000 steps (record every 100 steps) Vx Vy GPU Acceleration on Elastic RTM

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend