exascale road in china
play

exascale road in China Ruibo WANG National University of Defense - PowerPoint PPT Presentation

Tianhe-3 and the exascale road in China Ruibo WANG National University of Defense Technology Contents NUDT & TianHe the Exascale Road in China Tianhe-3 Contents NUDT & TianHe the Exascale Road in China Tianhe-3


  1. Tianhe-3 and the exascale road in China Ruibo WANG National University of Defense Technology

  2. Contents ❑ NUDT & TianHe ❑ the Exascale Road in China ❑ Tianhe-3

  3. Contents ❑ NUDT & TianHe ❑ the Exascale Road in China ❑ Tianhe-3

  4. NUDT & Tianhe ❑ NUDT ❑ 1953 originally founded at Harbin ❑ 1970 move to Changsha ❑ 1978 renamed as National University of Defense Technology Harbin Changsha

  5. NUDT & Tianhe ❑ Galaxy-I ❑ 1983, the 1st supercomputer in China ❑ peak performance 100 Mflops ❑ project started in 1978, widely used in oil exploration and weather forecast Galaxy-I supercomputer

  6. NUDT & Tianhe ❑ Galaxy-I ❑ 1983, 100 Mflops ❑ Galaxy-II Galaxy-II ❑ 1994, Gflops ❑ Vector structure ❑ Galaxy-III ❑ 1997, 13 Gflops ❑ MPP Galaxy-III ❑ MIPS CPU

  7. NUDT & Tianhe ❑ TianHe-1, deployed in 2009, 1.2Pflops ❑ Rank No.1 in China ❑ Rank No.5 in Top500 (Nov. 2009) ❑ TianHe-1A, deployed in 2010, 4.7Pflops ❑ Rank No.1 in Top500 (Nov. 2010) TianHe-1A TianHe-1

  8. NUDT & Tianhe ❑ TianHe-1A, deployed in 2010, 4.7Pflops ❑ Rank No.1 in Top500 (Nov. 2010) ❑ the 1st time China got the No.1 ❑ deployed in the National Supercomputer Center in Tianjin

  9. NUDT & Tianhe ❑ TianHe-2 made its pre-release @ IHPCF2013 ❑ International High Performance Computing Forum ❑ http://www.ihpcf.org/ ❑ Changsha, May. 2013

  10. NUDT & Tianhe ❑ TianHe-2 ranked No.1 from Jun. 2013 to Nov. 2015 ❑ No.1 for 3 years ( 6 times ) ❑ Peak 55Pflops, Linpack 33.86Pflops Nov. 2013, Denver Jun. 2013, Leipzig Jun. 2014, Leipzig Nov. 2014, New Orleans Jun. 2015, Frankfurt Nov. 2015, Austin

  11. NUDT & Tianhe ❑ TianHe-2 ❑ 16,000 compute nodes ❑ Frame: 32 compute Nodes ❑ Rack: 4 Compute Frames System ❑ Whole System: 125 Racks Compute Frame Compute Rack Compute Blade

  12. NUDT & Tianhe ❑ TianHe-2 Background ❑ Sponsored by 863 High Tech. Program, Government of Guangdong province and Government of Guangzhou city ❑ deployed in National Supercomputer Center in Guangzhou (NSCC-GZ) ❑ Oct. 2013: Phase 1 system was moved to NSCC-GZ

  13. NUDT & Tianhe ❑ Jan. 2014, Tianhe-2 began to provide service in NSCC-GZ

  14. NUDT & Tianhe ❑ Originally planned to finish its upgrade to Phase 2 in 2015 ❑ Use the new generation KNL to replace the KNC ❑ The peak performance would reach 100Pflops ❑ In early 2015, due to some reasons, we try to use the homegrown accelerator to upgrade the system ❑ Phase 2 system is ready in the end of 2017

  15. NUDT & Tianhe ❑ Comparison of Tianhe-2 & Tianhe-2A Tianhe-2 Tianhe-2A 16,000 nodes 17,792 nodes Nodes Intel CPU + KNC Intel CPU + Matrix-2000 & Performance 54.9 Pflops 100.68 Pflops Interconnect 10Gbps, 1.57us 14Gbps, 1us Memory 1.4PB 3PB Storage 12.4PB, 512GB/s 19PB, 1TB/s Energy Efficiency 17.8MW, 1.9Gflops/W 18.5MW, 5.4Gflops/W Programming OpenMP/OpenCL for MPSS for Intel KNC Environment Matrix-2000

  16. NUDT & Tianhe ❑ Matrix-2000 SN0 SN1 SN2 SN3 C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C Cluster 0 Cluster 1 Cluster 0 Cluster 1 Cluster 0 Cluster 1 Cluster 0 Cluster 1 C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C ❑ 4 super-nodes (SN) Cluster 2 Cluster 3 Cluster 2 Cluster 3 Cluster 2 Cluster 3 Cluster 2 Cluster 3 C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C Cluster 4 Cluster 5 Cluster 4 Cluster 5 Cluster 4 Cluster 5 Cluster 4 Cluster 5 C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C ❑ 8 clusters per SN Cluster 6 Cluster 7 Cluster 6 Cluster 7 Cluster 6 Cluster 7 Cluster 6 Cluster 7 ❑ 4 cores per cluster On chip interconnection ❑ Core PCIE DDR4 DDR4 DDR4 DDR4 ❑ Self-defined 256-bit vector ISA ❑ 16 DP flops/cycle per core ❑ Peak performance: 2.4576Tflops@1.2GHz 4 SNs x 8 clusters x 4cores x 16 flops x 1.2 GHz = 2.4576 Tflops ❑ Power: ~240w ❑ 8 DDR4-2400 channels ❑ x16 PCIe Gen3

  17. NUDT & Tianhe ❑ Heterogeneous Compute Nodes ❑ Intel Xeon CPU x2 ❑ Matrix-2000 x2 ❑ Memory:192GB ❑ Interconnect: 14G homegrown network ❑ Peak performance: 5.34Tflops

  18. NUDT & Tianhe ❑ Heterogeneous Compute Blades ❑ Compute blade = Xeon part + Matrix-2000 part 4 Intel Xeon CPUs 4 Matrix-2000 2 Compute Nodes ❑ Use the Matrix-2000 part to replace the KNC part

  19. NUDT & Tianhe ❑ Heterogeneous programming environment ❑ support OpenMP 4.x and OpenCL OpenMP 4.x OpenCL OpenCL X OpenMP runtime runtime compiler OpenMP runtime OpenCL runtime plugin API wrapper User heterogeneous computing library heterogeneous computing library Math Library symmetric communication library symmetric communication library driver driver (device) Kernel host OS device OS Xeon Matrix-2000

  20. Contents ❑ NUDT & TianHe ❑ the Exascale Road in China ❑ Tianhe-3

  21. Next step: Exascale ❑ Governments target on Exascale computing ❑ US, Japan, EU, China ❑ China has currently achieved 100P level, but Exascale poses great more challenges ❑ Memory wall ❑ Communication wall ❑ Reliability wall ❑ Energy consumption wall ❑ etc.

  22. More Walls for China ❑ Microelectronics & chip industry ❑ Still in an underdevelopment stage ❑ Calls for more Technology Accumulation ❑ Various & complex needs ❑ Huge & highly diverse market ❑ Calls for various design & development road ❑ Self-controllable road ❑ Processor ❑ Platform & OS ❑ APP ❑ Eco-system

  23. China’s Development 23

  24. National Projects & Plans in China ❑ since 1990, China release an HPC project in every 5-year plan, sponsored by the 863 High Tech. Program of the Ministry of Science & Technology ❑ the 10th 5-year plan (2001~2005) ❑ Project: High performance Computer and software system ❑ Targets: TFlops supercomputer and High Performance computing environment ❑ Successfully developed TF-scale computers and China National Grid (CNGrid) testbed ❑ the 11th 5-year plan (2006~2010) ❑ Project: High productive computer and network computing environment ❑ Targets: PFlops supercomputer and Grid computing environment ❑ Successfully developed Peta-scale computers, upgraded CNGrid into the national HPC service environment

  25. National Projects & Plans in China ❑ the 12th 5-year plan (2011~2015) ❑ Project: High productive computer and computing environment ❑ Targets: 100PFlops supercomputer and cloud computing environment ❑ Developed world-class computer systems ❑ Tianhe-2 ❑ Sunway TaihuLight ❑ the 13th 5-year plan (2016~2020) ❑ Project: Exascale system ❑ Targets: key technology of EFlops supercomputer

  26. The 13th 5-year plan (2016~2020) ❑ GOALS ❑ Develop self-dependent and controllable core technology of exascale computing, and keep China’s leading position ❑ Develop a series of critical HPC application and software center, building the HPC application eco-system ❑ Build national HPC environment with global top level resources and services ❑ Two Steps to Exascale ❑ Support vendors to develop prototypes (2016-2018) ❑ Choose and support vendors to achieve exascale

  27. Exascale Goal in 2016 proposal ❑ System performance 1 Eflops ❑ Node performance > 10Tflops ❑ Network bandwidth > 400Gbps ❑ Network scale up to more than 100,000 nodes ❑ MPI latency < 1.2us ❑ Linpack efficiency > 60% ❑ Power efficiency > 30Gflops/W

  28. Vendors in China ❑ University ❑ NUDT ❑ homegrown CPU, accelerator and interconnect ❑ Institute ❑ National Research Center of Parallel Computer Engineering and Technology (NRCPC) ❑ homegrown many-core CPU ❑ Company ❑ Dawning (Sugon) ❑ Various products lines besides HPC: server, PC, data center products, etc. ❑ High portion of market share

  29. NUDT exascale prototype system deployed in the National Supercomputer Center in Tianjin, 2018

  30. NUDT exascale prototype system ❑ 512 nodes ❑ 3 MT-2000+ processors ❑ 6Tflops peak performance ❑ Matrix-2000+ ❑ 128 cores ❑ 2 GHz ❑ 2 Tflops ❑ ~130W, ~15Gflops/W ❑ 400Gbps homegrown network

  31. NUDT exascale prototype system ❑ Air and water hybrid cooling ❑ PUE < 1.15 ❑ High density

  32. NRCPC exascale prototype system ❑ SW26010 CPU ❑ Used in Sunway TaihuLight system ❑ 512 nodes ❑ Each node has 2 CPUs ❑ Homegrown network

  33. Sugon exascale prototype system

  34. Sugon exascale prototype system ❑ Heterogenous architecture ❑ Hygon CPU + DCU ❑ 6D torus network

  35. Sugon exascale prototype system ❑ Hierarchy ❑ 512 Nodes ❑ 32 Supernodes ❑ 6 Silicon Units ❑ 1 Silicon Cube ❑ Cooling ❑ Total immersion cooling ❑ No noise ❑ Better performance on heat exchange

  36. Exascale prototype systems ❑ Compute ❑ Traditional multi-core CPU ❑ Many-core CPU ❑ CPU + DCU ❑ Network ❑ Homegrown interconnect network ❑ Commercial network ❑ Cooling ❑ Air & Water Hybrid cooling ❑ Traditional Water cooling ❑ Total immersion cooling

  37. Contents ❑ NUDT & TianHe ❑ the Exascale Road in China ❑ Tianhe-3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend