gpgpu 03
play

GPGPU 03 NVIDIA case study GeForce 7800 (2006) GeForce 7800 - PowerPoint PPT Presentation

GPGPU 03 NVIDIA case study GeForce 7800 (2006) GeForce 7800 Impossible to maximize throughput with such a rigid architecture: you cant keep vertex and fragment shading units busy all the time As a result, many bottlenecks in the


  1. Turing SM ● Divided into 4 pipelines, each housing ○ 16 FP32 ○ 16 INT32 ○ 2 Tensor core ○ 1 warp scheduler ○ 1 dispatch ● 96 KB L1/shared ○ 64 KB is “shader RAM” (per SM) when executing graphics works ● L0 instruction cache

  2. Memory latencies

  3. A*B + C

  4. Raytracing

  5. Raytracing “before”

  6. Raytracing “now”

  7. DXR

  8. Raytracing in practice ● Hybrid solutions to minimize the number of rays ○ Low sample counts usually come with extreme noise - denoising to the forefront of research ■ https://www.youtube.com/watch?v=5pxnDsFLAuY ■ https://research.nvidia.com/publication/interactive-reconstruction-monte-carlo-image-seq uences-using-recurrent-denoising ■ https://www.youtube.com/watch?v=mtdRfl4fmvQ ● Acceleration Structures mean a considerable increase in GPU memory ● Decrease payload sizes as much as you can

  9. Raytracing in practice

  10. DXR

  11. Mesh shader - motivation

  12. Mesh shaders

  13. Mesh shaders ● Task shader: threads in workgroups. Each can launch an arbitrary number (including zero) mesh shader workgroups ● Mesh shader: each thread can create primitives.

  14. Mesh shaders

  15. Mesh shaders

  16. Mesh shaders

  17. Texture space shading ● Turing feature, only available via extensions (just like mesh shading) ● Store the shaded fragments of a triangle in a separate texture ● Independent of visibility ● Re-sample this stashed texture instead of re-evaluating the full shading ● Unless we moved around too much ● For certain applications it’s almost a given that we are at least roughly at the same place for a frame: VR left and right eyes

  18. Classic

  19. Texture space

  20. Texture space shading ● https://devblogs.nvidia.com/texture-space-shading/ ● https://www.youtube.com/watch?v=Rpy0-q0TyB0

  21. References ● Fermi whitepaper: ○ http://www.nvidia.com/content/pdf/fermi_white_papers/p.glaskowsky_nvidia's_fermi-the_first_complete_gpu_a rchitecture.pdf ○ http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf ● Kepler whitepaper: https://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf ● Maxwell whitepaper: ○ http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce-GTX-750-Ti-Whitepaper.pdf ○ http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_980_Whitepaper_FIN AL.PDF ● Pascal whitepaper: ○ http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_1080_Whitepaper_FIN AL.pdf ○ https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf ● Volta whitepaper: ○ http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf ● Turing whitepaper: ○ https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVI DIA-Turing-Architecture-Whitepaper.pdf

  22. References The lowest level details are unfortunately only available via reverse-engineering: ● Volta: https://arxiv.org/abs/1804.06826 ● Turing: https://arxiv.org/pdf/1903.07486.pdf

  23. Ampere

  24. In numbers ● 7 GPCs ● Each GPC contains ○ 6 TPCs ○ 1 raster engine ○ (NEW) 2 ROP partitions ○ (NEW) 8 ROP units per ROP partition ● Each TPC contains ○ 2 SMs ○ 1 polymorph engine ● Each SM contains ○ 128 CUDA cores ○ 4 Texture units ○ 4 Tensor Cores (3rd gen) ○ 1 RT Core (2nd gen) ○ 256 KB register file partitioned into 4 64 KB parts ○ 128 KB of configurable L1/Shared memory

  25. In numbers ● 12 x 32 bit memory controllers (384 bit) ● 512 KB L2 cache per controller (6144 KB in total) ● An SM partition can now service 2 FP32 operations (in Turing: it could only double issue a float-int operation pair)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend