Lecture 7: Single Node Architectures
Abhinav Bhatele, Department of Computer Science
High Performance Computing Systems (CMSC714)
Lecture 7: Single Node Architectures Abhinav Bhatele, Department of - - PowerPoint PPT Presentation
High Performance Computing Systems (CMSC714) Lecture 7: Single Node Architectures Abhinav Bhatele, Department of Computer Science Summary of last lecture Task-based programming models and Charm++ Key principles: Over-decomposition,
Abhinav Bhatele, Department of Computer Science
High Performance Computing Systems (CMSC714)
Abhinav Bhatele, CMSC714
2
Abhinav Bhatele, CMSC714
3
https://en.wikipedia.org/wiki/Von_Neumann_architecture
Abhinav Bhatele, CMSC714
4
Uniform Memory Access Non-uniform Memory Access
https://frankdenneman.nl/2016/07/07/numa-deep-dive-part-1-uma-numa/
Abhinav Bhatele, CMSC714
4
Uniform Memory Access Non-uniform Memory Access
https://frankdenneman.nl/2016/07/07/numa-deep-dive-part-1-uma-numa/
Abhinav Bhatele, CMSC714
5
Abhinav Bhatele, CMSC714
6
Abhinav Bhatele, CMSC714
7
SerDes SerDes
Abhinav Bhatele, CMSC714
8
P9 P9 DRAM 256 GB HBM 16 GB GPU 7 TF HBM 16 GB GPU 7 TF HBM 16 GB GPU 7 TF DRAM 256 GB HBM 16 GB GPU 7 TF HBM 16 GB GPU 7 TF HBM 16 GB GPU 7 TF
TF 42 TF (6x7 TF) HBM 96 GB (6x16 GB) DRAM 512 GB (2x16x16 GB) NET 25 GB/s (2x12.5 GB/s) MMsg/s 83
NIC HBM/DRAM Bus (aggregate B/W) NVLINK X-Bus (SMP) PCIe Gen4 EDR IB
HBM & DRAM speeds are aggregate (Read+Write). All other speeds (X-Bus, NVLink, PCIe, IB) are bi-directional.
NVM 6.0 GB/s Read 2.2 GB/s Write 12.5 GB/s 12.5 GB/s 1 6 G B / s 1 6 G B / s 64 GB/s 135 GB/s 135 GB/s 50 GB/s 50 GB/s 50 GB/s 50 GB/s 50 GB/s 50 GB/s 50 GB/s 50 GB/s 50 GB/s 50 GB/s 900 GB/s 900 GB/s 900 GB/s 900 GB/s 900 GB/s 900 GB/s
Abhinav Bhatele, CMSC714
9
The World’s Most Advanced Data Center GPU
Abhinav Bhatele, CMSC714
10
The World’s Most Advanced Data Center GPU
https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
Abhinav Bhatele, CMSC714
11
The IBM Blue Gene/Q Compute Chip
Abhinav Bhatele, CMSC714
would it compare today?
law apply to the GPUs, do they get 2x faster every 2 years?
differentiated (chosen) for a purpose?
programming languages? (java has mergesort, python timsort, C++ implements quicksort) Is it used more in HPC?
12
Debunking the 100X GPU vs. CPU myth
Abhinav Bhatele 5218 Brendan Iribe Center (IRB) / College Park, MD 20742 phone: 301.405.4507 / e-mail: bhatele@cs.umd.edu