understanding gpu performance
play

Understanding GPU performance How to get peak FLOPS (GPU version) - PowerPoint PPT Presentation

Understanding GPU performance How to get peak FLOPS (GPU version) Kenjiro Taura 1 / 7 Contents 1 Data Access Performance 2 / 7 Contents 1 Data Access Performance 3 / 7 Data access performance data access performance is important in GPU too


  1. Understanding GPU performance How to get peak FLOPS (GPU version) Kenjiro Taura 1 / 7

  2. Contents 1 Data Access Performance 2 / 7

  3. Contents 1 Data Access Performance 3 / 7

  4. Data access performance data access performance is important in GPU too 4 / 7

  5. Memory organization Pascal (P100) level line size capacity associativity L1 32B 24KB/SM ? L2 32B 4MB/device ? Global Memory 12/16GB N/A Shared Memory 64KB ( ∗ ) N/A Volta (V100) level line size capacity associativity L1 32B 32-128 KB/SM ( ∗ ) ? L2 32B 6MB/device ? Global Memory 16GB N/A Shared Memory ≤ 96KB ( ∗ ) N/A ∗ : 128KB is split between L1 and Shared Memory (configurable) source: https://arxiv.org/abs/1804.06826 5 / 7

  6. Global vs. Shared Memory global memory and L1/L2 cache are the ordinary memory that make a hierarchy cudaMalloc returns a global memory accesses to global memory are transparently cached into L1/L2 caches shared memory is an explicitly-managed scratch memory latency shorter than L1 (esp. on Pascal) you explicitly move between global and shared memory data shared only within a thread block programming interface is covered shortly 6 / 7

  7. Latency measurement the same pointer chasing experiment as we did on CPU ✞ for ( N times) { 1 p = p->next; 2 } 3 next pointers (link all elements in a random order) cache line size N elements 7 / 7

  8. Data size vs. latency even L1 cache hit takes 30 (Volta) - 100 (Pascal) cycles latency per load in a random list traversal 700 p 8 v 8 600 latency/load (GPU cycles) 500 400 300 200 100 0 1024 4096 16384 65536 262144 1 . 04858 × 10 6 4 . 1943 × 10 6 1 . 67772 × 10 7 6 . 71089 × 10 7 size of the region (bytes) 8 / 7

  9. Shared memory 9 / 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend