a fully a fu lly gpu gpu based based ou out of of
play

A Fully A Fu lly GPU GPU-Based Based Ou Out-Of Of- ology Con - PowerPoint PPT Presentation

ence ce eren Confer A Fully A Fu lly GPU GPU-Based Based Ou Out-Of Of- ology Con Co Core e App pproac oach h to to Ha Handle ndle h, Germany many hnolog Oct 2018 2018 La Large e Volume olume Da Data ta echn Munich,


  1. ence ce eren Confer A Fully A Fu lly GPU GPU-Based Based Ou Out-Of Of- ology Con Co Core e App pproac oach h to to Ha Handle ndle h, Germany many hnolog Oct 2018 2018 La Large e Volume olume Da Data ta echn Munich, Ger GPU Tec 11 Oct Munic GPU 09-11 Nicol colas s Courilleau rilleau 1,2 ,2 , , Jona nathan than Sarton ton 1 , , Flor orent ent Dugu guet 1,3 ,3 , ion 1 and Laurent Yann nnic ick k Remion ent Lucas as 1 09 1 – Univ 1 iversité ité de Reim ims Champa pagne gne-Ar Arde denn nne, Franc nce 2 – Neoxi 2 xia, Franc nce 3 – Altimesh 3 imesh, Franc nce

  2.  Background and motivation  Previous works Outline  Out-of-core model presentation  Model in action: application to visualization  Conclusion and outlook

  3. Context Local 3D DATA Offshore x TB Teleworking • Targets HPC of 3DNeuroSecure • Interactive processing and visualization (virtual microscopy, DVR) of very large biomedical datasets • Accelerating drug discovery for Alzheimer disease N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  4. Problematic • Designing out-of-core algorithms • Voxel representation → High volume of data >> CPU and GPU memory Domain/Application Data size Mesh  100 GB voxelization 4352 3 (RGBA – 32bits) ≈ 330 GB  100 GB Histology to Electron microscopy several TB Regular 3D grid And beyond N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  5. Previous works

  6. Previous works [Klaus Engel] [Fogal et al.] IEEE Symposium on Large Data IEEE Symposium on Large Data Analysis & Visualization Analysis & Visualization 2009 2011 2012 2013 ACM SIGGRAPH i3D IEEE Transaction on Visualisation & Computer Graphics [Crassin et al.] [Hadwiger et al.] N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  7. Previous works… at a glance • Address translation taxonomy [Beyer et al. 2015] N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  8. Previous works… at a glance • Bricking: Page table look-up • Octree multi-resolution: tree traversal • Multi-resolution page table [Beyer et al. 2015] N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  9. Previous works… at a glance • Bricking: Page table look-up • Octree multi-resolution: tree traversal • Multi-resolution page table [Beyer et al. 2015] N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  10. Previous works… at a glance • Bricking: Page table look-up • Octree multi-resolution: tree traversal • Multi-resolution page table [Beyer et al. 2015] N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  11. And Nvidia – Pascal / Volta unified memory • GPU memory oversubscription (unified memory) • Limited to host memory / OS specs limitation • Volume decomposition still needed • Volta using • Nvidia Tesla V100 • IBM Power 9 • NVLink 2 (+ OS ATS) • Unix « mmap » • Unix kernel 4.16 (at least) • Limitations • ATS over NVLink 2 = Power 9 • NVLink 2 = Tesla V100 • No page fault control • No texture memory Summit - DOE/SC/Oak Ridge National Laboratory [Everything you need to know about unified memory, Nikolay Sakharnykh, GTC 2018] N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  12. Our contributions • GPU based out-of-core data management • Multiresolution multilevel page table hierarchy • Managed entirely on GPU • Any kind of applications (regular 3D grids of voxels) • Interactive visualization • On-demand data processing • Both at the same time • CPU – GPU communications reduced • Complete pipeline – From storage to GPU In addition, • Multi OS support, since Kepler architecture N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  13. Out-of-core model presentation

  14. Data representation and storage • (1) Multiresolution – Level of details Level 2 • (2) Bricking – Level subdivision • Allows the out-of-core approach Level 1 • (1) + (2) = Bricked multiresolution 3D pyramid Level 0 • Bonus: Data compression (LZ4 – Loss less and real-time decompression) N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  15. Multiresolution multilevel page table hierarchy Brick cache N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  16. Multiresolution multilevel page table hierarchy Multiresolution page table Brick cache N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  17. Multiresolution multilevel page table hierarchy Multiresolution page directory Page table cache Brick cache N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  18. Multiresolution multilevel page table hierarchy • Entry = Multiresolution page directory • 3D coordinates of the block in the next cache • + Flag: Page table cache • Mapped • Unmapped • Empty Brick cache N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  19. Virtual addressing • Virtual volume navigation – address = [𝑚, 𝑞] • 𝑚 = Level of detail • 𝑞 = 3D normalized positon, 𝑦, 𝑧, 𝑨 ∈ [0, 1) 3 MRPD PT1 Brick cache N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  20. Cache miss • Virtual volume navigation – address = [𝑚, 𝑞] • 𝑚 = Level of detail • 𝑞 = 3D normalized positon, 𝑦, 𝑧, 𝑨 ∈ [0, 1) 3 MRPD PT1 Cache miss Brick cache N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  21. Pipeline 1 – Voxel cache request Localhost Mass storage End-user Application L2 Bricks positions Cache Manager request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … Requested bricks Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  22. Pipeline 2 – Hierarchy look-up Localhost Mass storage End-user Application L2 Bricks positions Cache Manager request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … Requested bricks 2 Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  23. Pipeline 2.1 – Request list creation Localhost Mass storage End-user Application L2 Bricks positions Cache Manager request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … 2.1 Requested bricks 2 Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  24. Pipeline 2.2 – Request list asynchronous handling Localhost Mass storage End-user Application L2 Bricks positions Cache Manager 2.2 request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … 2.1 Requested bricks 2 Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  25. Pipeline 2.3 – CPU cache look-up (simple cache) Localhost Mass storage End-user Application L2 Bricks positions Cache Manager 2.2 request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … 2.1 Requested 2.3 bricks 2 Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  26. Pipeline 2.4 – If not in CPU cache = Loading from mass storage Localhost Mass storage End-user Application L2 Bricks positions Cache Manager 2.2 request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … 2.1 Requested 2.3 2.4 bricks 2 Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

  27. Pipeline 2.5 – Load bricks in a Cuda zero copy buffer Localhost Mass storage End-user Application L2 Bricks positions Cache Manager 2.2 request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … 2.1 Requested 2.5 2.3 2.4 bricks 2 Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend