large scale adaptive mesh simulations through non
play

Large-Scale Adaptive Mesh Simulations Through Non-Volatile - PowerPoint PPT Presentation

Large-Scale Adaptive Mesh Simulations Through Non-Volatile Byte-Addressable Memory Bao Nguyen Hua Tan Xuechen Zhang Kei Davis* * Octree Meshing is Widely Used in HPC Simulation Droplet breakup Micro-boiling Droplet ejection 2


  1. Large-Scale Adaptive Mesh Simulations Through Non-Volatile Byte-Addressable Memory Bao Nguyen Hua Tan Xuechen Zhang Kei Davis* *

  2. Octree Meshing is Widely Used in HPC Simulation Droplet breakup Micro-boiling Droplet ejection 2

  3. Quad/Octree-Based Adaptive Meshing R 10 9 6 7 8 1 6 4 5 1 3 2 9 10 2 7 8 4 5 3 Domain decomposition Quad/octree representation in DRAM Because models span larger length and time-scales, DRAM demand is significant even on supercomputers. 3

  4. Per-core DRAM Capacity is Shrinking on Supercomputers Jaguar: 2.7-4 GB/core Titan: 2 GB/core Due to associated capital costs and power consumptions 4

  5. Using Non-Volatile Byte-addressable Memory for Meshing Non-Volatility Byte-Addressability Speed Cost Power Flash Low Decreasing Yes No Low DRAM High Increasing No Yes High NVBM High* Decreasing Yes Yes Low Non-Volatility Byte-Addressability Speed Cost Power 5

  6. Existing Applications were Not Designed for NVBM Linear octree[SC’ 07 ], parallel octree[SC’05], etc. In-core But they save snapshots on storage systems for failure Algorithms recovery; I/Os can be the bottleneck. Etree[SC’04], visualization[TVCG’97], etc. Out-of-core But they were designed for slow non-volatile Algorithms mediums, e.g., SSDs and HDDs. Can we support in-NVBM octree meshing bypassing slow I/O buses? 6

  7. Challenge I: NVBM Writes Incur Higher Latency DRAM NVBM NVBM write latency is 2.5X greater than DRAM. Meshing operations (e.g., refinement) are write-intensive. 7

  8. Challenge II: Existing Octrees Are Not Durable for NVBM After normal pointer writing After failed pointer writing 7 8 10 7 8 10 9 9 11 X A failure may cause the pointer to link to an undefined region in NVBM. 8

  9. Challenge III: Difficult to Handle Special Pointers R Special pointers 1 6 7 8 10 2 4 5 9 3 NVBM DRAM Handling special pointers introduces extra complexity for application developers. . 9

  10. Design Objectives of Persistent-Merged Octree + + In-NVBM meshing Hiding write Orthogonal & storage latency to NVBM persistence Persistent-merged octree (PM-octree) 10

  11. PM-Octree Design: A Multi-Version Data Structure V i-1 V i Persistent Volatile NVBM DRAM +NVBM The persistent version provides the desired durability. 11

  12. PM-Octree Design: Octant Sharing between Versions V i-1 V i C 1 tree NVBM Observation: many spatial Reduce the memory usage domains do not change in by up to 1.9X. adjacent time steps. . 12

  13. PM-Octree Design: Partitioned Data Structure V i R V D i 1 6 3 5 8 9 2 4 7 10 C 0 tree in DRAM C 1 tree in NVBM Effectively use both DRAM and NVBM. 13

  14. PM-Octree Design: Dynamic Layout Transformation V i R R V iD V iD 1 6 1 6 7 8 9 10 2 3 4 5 2 3 4 5 7 8 9 10 NVBM DRAM DRAM NVBM Layout transformation is periodically executed to hide NVBM write latency. 14

  15. Putting Together the Components of PM-Octree V i-1 V i V i D C 1 tree C 1 tree C 0 tree DRAM NVBM A multi-version data structure for both in-memory meshing and storage. It provides near-instantaneous failure recovery by accessing memory bus. 15

  16. Basic Operation: Octant Insertion Before inserting octant 11 After inserting octant 11 V i-1 V i-1 V i R R R ’ 11 1 1 6 6 u u u’ 2 3 4 5 7 8 9 10 2 3 4 5 7 8 9 10 9’ 11 16

  17. Basic Operation: Octant Update Before updating octant 10 After updating octant 10 V i-1 V i V i V i-1 R R ’ R ’ R 1 6 u 1 6 u ’ u u ’ 2 3 4 5 7 8 9 10 9’ 2 3 4 5 7 8 9 10 9’ 10’ 11 11 17

  18. PM-Octree Design: Orthogonal Persistence Routine Description create a new PM-octree; pmoctree ⋆ pm_create(octree ⋆ tree) return a pointer to V i create a persistent version of void pm_persistent(pmoctree ⋆ tree) octree restore a PM-octree; pmoctree ⋆ pm_restore(void) return a pointer to V i delete all octants on NVBM and void pm_delete(pmoctree ⋆ tree) DRAM We integrated it with Gerris flow solver. 18

  19. Experimental Setting • Hardware Ø Titan at ORNL Ø Emulation of NVBM using DRAM Routine DRAM NVBM Read Latency (ns) 60 100 Write Latency (ns) 60 150 • Simulation • Droplet rotation and ejection 19

  20. Comparison of Meshing Methods Objects Objects Method name Interface in DRAM in NVBM In-core-octree Octants Snapshot File System Octant Out-of-core-octree Cache File System record PM-octree Octants Octants Memory 20

  21. Weak Scaling • 1.2M to 1077M elements • 1 to 1000 PEs • Number of element on each PE: ~1 million The execution time of PM-octree increases as a logarithm of problem size. 21

  22. Execution Time Breakdown with Weak Scaling Tree partitioning overhead prevents from achieving an optimal speedup. 22

  23. Strong Scaling • Problem size is 150 million elements • 240 to 1000 PEs Scalability of PM-octree is similar as in-core-octree. 23

  24. Execution Time Breakdown with Strong Scaling No scalability issue because no major fluctuation is observed 24

  25. Failure Recovery PM-octree reduces the failure recovery time by up to 20X. PM-octree guarantees data consistency after failures. 25

  26. Conclusions • PM-octree effectively extends memory capacity using NVBM. • It scales as well as in-core algorithms. • It significantly reduces the time of recovery. • It provides easy-to-program interface. 26

  27. Acknowledgments Xuechen Zhang xuechen.zhang@wsu.edu Bao Nguyen Hua Tan 27

  28. Basic Operation: Octant Merging Before merging C 0 After merging C 0 V i-1 V i C 1 C 1 C 0 C 0 NVBM subtree DRAM subtree 28

  29. Basic Operations: Persistent Before persistent After persistent V i+1 V i-1 V i V i R R R ’ R ’ 29

  30. Layout Dynamic Transformation Execution time is reduced by 25% while the number of writes is reduced by up to 30%. 30

  31. Impact of DRAM Size Varied memory sizes influence the merging frequency and execution time. 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend