SLIDE 8 ...mapped onto the GPU
29
E F D C B A
CPU processes...
Fermi without Hyper-Q: Temporal division on each SM.
A B C D E F
100 50 GPU utilization (%)
Time
Saved time
A A A B B B C C C D D D E E E F F F
Kepler with Hyper-Q: Simultaneous multip. SMX
100 50 GPU utilization (%)
Swift operations:
Thread array creation. Messages. Block transfers. Collective operations.
A look ahead: The Echelon execution model
30
A
B
Active message M e m
y h i e r a r c h y Global address space
Thread Object B
Load/Store
A B
B u l k X f e r
Conclusions
Kepler represents a new generation of GPU hardware, deploying thousands of cores to benefit from the CUDA model in large-scale applications. Major enhancements:
The GPU gets more autonomous, can create threads by itself. Threads scheduling gets more efficient, particularly on tiny threads. Larger caches and faster memory bandwidth, also between GPUs. The GPU can now execute much larger problem sizes and deploy more parallelism. Biomedicine and image processing are two fields which
can get more benefit from these enhancements.
31
Thanks for your attention!
You can always reach me at:
email: ujaldon@uma.es Web page at the University of Malaga: http://manuel.ujaldon.es Web site at Nvidia:
http://research.nvidia.com/users/manuel-ujaldon
For additional information, read the Kepler whitepaper:
http://www.nvidia.com/object/nvidia-kepler.html
To attend to the official talk given at GTC'12 (webinar):
http://www.gputechconf.com/gtcnew/on-demand-gtc.php#1481
To listen additional Nvidia material about GPU computing:
http://www.nvidia.com/object/webinar.html
32