Architecture-aware Algorithms and Software for Peta and Exascale - PowerPoint PPT Presentation

Architecture-aware Algorithms and Software for Peta and Exascale Computing Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 2/25/16 1

Outline • Overview of High Performance Computing • Look at an implementation for some linear algebra algorithms on today’s High Performance Computers § As an examples of the kind of thing needed. 2

State of Supercomputing in 2016 • Pflops (> 10 15 Flop/s) computing fully established with 81 systems. • Three technology architecture possibilities or “swim lanes” are thriving. • Commodity (e.g. Intel) • Commodity + accelerator (e.g. GPUs) (104 systems) • Special purpose lightweight cores (e.g. IBM BG, ARM, Intel’s Knights Landing) • Interest in supercomputing is now worldwide, and growing in many new markets (around 50% of Top500 computers are used in industry) . • Exascale (10 18 Flop/s) projects exist in many countries and regions. • Intel processors have largest share, 89% followed 3 by AMD, 4%.

H. Meuer, H. Simon, E. Strohmaier, & JD - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem TPP performance Rate - Updated twice a year Size SC‘xy in the States in November Meeting in Germany in June - All data available from www.top500.org 4

Performance Development of HPC over the Last 24 Years from the Top500 1 Eflop/s 1E+09 420 PFlop/s 100 Pflop/s 100000000 33.9 PFlop/s 10 Pflop/s 10000000 1 Pflop/s 1000000 SUM 100 Tflop/s 100000 206 TFlop/s 10 Tflop/s N=1 10000 6-8 years 1 Tflop/s 1000 1.17 TFlop/s My Laptop 70 Gflop/s 100 Gflop/s 100 N=500 59.7 GFlop/s 10 Gflop/s My iPhone 4 Gflop/s 10 1 Gflop/s 1 400 MFlop/s 100 Mflop/s 0.1 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2015

November 2015: The TOP 10 Systems Rmax % of Power MFlops Rank Site Computer Country Cores [Pflops] Peak [MW] /Watt National Super Tianhe-2 NUDT, Computer Center in Xeon 12C + IntelXeon Phi (57c) 1 China 3,120,000 33.9 62 17.8 1905 Guangzhou + Custom Titan, Cray XK7, AMD (16C) + DOE / OS 2 Nvidia Kepler GPU (14c) + USA 560,640 65 8.3 2120 17.6 Oak Ridge Nat Lab Custom DOE / NNSA Sequoia, BlueGene/Q (16c) 3 USA 1,572,864 17.2 85 7.9 2063 L Livermore Nat Lab + Custom RIKEN Advanced Inst K computer Fujitsu SPARC64 4 Japan 705,024 93 12.7 827 10.5 for Comp Sci VIIIfx (8c) + Custom DOE / OS Mira, BlueGene/Q (16c) 5 USA 786,432 8.16 85 3.95 2066 Argonne Nat Lab + Custom DOE / NNSA / Trinity, Cray XC40,Xeon 16C + 6 USA 301,056 8.10 80 Los Alamos & Sandia Custom Piz Daint, Cray XC30, Xeon 8C + 7 Swiss CSCS Swiss 115,984 81 2.3 2726 6.27 Nvidia Kepler (14c) + Custom Hazel Hen, Cray XC40, Xeon 12C 8 HLRS Stuttgart Germany 185,088 5.64 76 + Custom Shaheen II, Cray XC40, Xeon Saudi 9 KAUST 196,608 5.54 77 2.8 1954 16C + Custom Arabia Texas Advanced Stampede, Dell Intel (8c) + Intel 10 USA 204,900 5.17 61 4.5 1489 Computing Center Xeon Phi (61c) + IB 500 (368) Regensburg Eurotech Intel Germany 15,872 .206 95

Commodity plus Accelerator Today 104 of the Top500 Systems 192 Cuda cores/SMX Commo mmodity y Acce Accelera rator r (G (GPU PU) ) 2688 “Cuda cores” Gives 14 cores Intel Xeon Nvidia K20X “Kepler” 8 cores 2688 “Cuda cores” 3 GHz .732 GHz 8*4 ops/cycle 2688*2/3 ops/cycle 96 Gflop/s (DP) 1.31 Tflop/s (DP) 6 GB Interconnect 7 PCI-e Gen2/3 16 lane 64 Gb/s (8 GB/s) 1 GW/s

Accelerators 110 90 Kepler/Phi Clearspeed 70 PEZY-SC Systems IBM Cell 50 ATI Radeon 30 Intel Xeon Phi NVIDIA 10 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 -10

Core Counts in the Top500 Systems #1, Max, Mean, Min 07 9

Recent Developments ¨ US DOE planning to deploy three O(100) Pflop/s systems for 2017-2018 - $525M hardware ¨ Oak Ridge Lab and Lawrence Livermore Lab to receive IBM and Nvidia based systems ¨ Argonne Lab to receive Intel based system Ø After this Exaflops ¨ US Dept of Commerce is preventing some China groups from receiving Intel technology Ø Citing concerns about nuclear research being done with the systems; February 2015. Ø On the blockade list: Ø National SC Center Guangzhou, site of Tianhe-2 Ø National SC Center Tianjin, site of Tianhe-1A Ø National University for Defense Technology, developer Ø National SC Center Changsha, location of NUDT ¨ For the first time, < 50% of Top500 are in the U.S. Ø 201 of the systems are U.S.-based, China #2 w/109. 10

Yutong Lu from NUDT at ISC Last Week 07 11

Countries Share Absolute Counts US: 201 China: 109 Japan: 38 Germany: 32 UK: 18 France: 18 UK China nearly tripled the number of systems on the latest list, Saudi while the number of systems in the CH Arabia US has fallen to the lowest point since the TOP500 list was created. In Italy: 2 - Exploration & Production - Eni S.p.A. 2 - CINECA

Technology Trends: Microprocessor Capacity Gordon Moore (co-founder of Intel) Electronics Magazine, 1965 Microprocessors have become smaller, Number of devices/chip doubles every 18 months denser, and more powerful. Not just processors, bandwidth, storage, etc. 2X memory and processor speed and 2X transistors/Chip Every ½ size, cost, & power every 18 1.5 years months. 14 Called “Moore’s Law”

Architecture-aware Algorithms and Software for Peta and Exascale - PowerPoint PPT Presentation

Architecture-aware Algorithms and Software for Peta and Exascale Computing Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 2/25/16 1 Outline Overview of High Performance Computing Look at

Architecture-Aware Algorithms and Software for Peta and Exascale Computing Jack Dongarra

PEGASUS: A peta-scale graph mining system - Implementation and observations U. Kang, C. E.

Challenges and Solutions for Peta- and Exa-Sacle Programming Tasuku Hiraishi Academic Center for

The Power of One Poorva Joshipura, Vice President International Affairs, PETA Foundation Board

www.localgloballearning.edu.au @LocGloLearning Kelsey Halbert Peta Salter Michael Singh

QoS QoS Aware Aware BiNoC BiNoC Architecture Architecture Shih Shih- -Hsin Hsin Lo, Ying

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

From math to peta-app: Challenges to practical computation with tensor-based algorithms Robert

Component- -based, Context based, Context- -aware aware Component Software Systems Software

CREST Development of System Software Technologies for post-Peta Scale High Performance Computing

Part 2, course 2: Cache Oblivious Algorithms CR10: Data Aware Algorithms October 2, 2019 Agenda

Design of Bandwidth Bandwidth Aware Aware and and Design of Congestion Avoiding Avoiding

Part 3: Memory-Aware DAG Scheduling CR05: Data Aware Algorithms October 12 & 15, 2020

Betting on Software Architecture as Code a note on hypothesis-driven architecture James Lewis :

From Requirements to Architecture Ana Moreira Software Architecture - Basics 1 Goals

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Energy-Efficient Fault Tolerance In Chip Multiprocessors Using Critical Value Forwarding P.

With Extreme Scale Computing the Rules Have Changed Jack Dongarra University of Tennessee Oak

Apache Spark Corso di Sistemi e Architetture per Big Data A.A. 2019/2020 Valeria Cardellini

Peer-to-Peer Networks 18 Hole Punching Christian Schindelhauer Technical Faculty

BioSimWare: A P Systems-based Simulation Environment for Biological Systems Daniela Besozzi 1 ,

RENOVATION UPDATE Agenda Overview of Club Renovation Design of Main Club House

Two Systems & Two Theories of Mind Ian A. Apperly & Stephen A. Butterfill Are human

PSYC 335 Developmental Psychology I Session 3 Theories in Developmental Psychology- Part I

Architecture-aware Algorithms and Software for Peta and Exascale - PowerPoint PPT Presentation

Architecture-aware Algorithms and Software for Peta and Exascale Computing Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 2/25/16 1 Outline Overview of High Performance Computing Look at

Architecture-Aware Algorithms and Software for Peta and Exascale Computing Jack Dongarra

PEGASUS: A peta-scale graph mining system - Implementation and observations U. Kang, C. E.

Challenges and Solutions for Peta- and Exa-Sacle Programming Tasuku Hiraishi Academic Center for

The Power of One Poorva Joshipura, Vice President International Affairs, PETA Foundation Board

www.localgloballearning.edu.au @LocGloLearning Kelsey Halbert Peta Salter Michael Singh

QoS QoS Aware Aware BiNoC BiNoC Architecture Architecture Shih Shih- -Hsin Hsin Lo, Ying

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

From math to peta-app: Challenges to practical computation with tensor-based algorithms Robert

Component- -based, Context based, Context- -aware aware Component Software Systems Software

CREST Development of System Software Technologies for post-Peta Scale High Performance Computing

Part 2, course 2: Cache Oblivious Algorithms CR10: Data Aware Algorithms October 2, 2019 Agenda

Design of Bandwidth Bandwidth Aware Aware and and Design of Congestion Avoiding Avoiding

Part 3: Memory-Aware DAG Scheduling CR05: Data Aware Algorithms October 12 &amp; 15, 2020

Betting on Software Architecture as Code a note on hypothesis-driven architecture James Lewis :

From Requirements to Architecture Ana Moreira Software Architecture - Basics 1 Goals

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Energy-Efficient Fault Tolerance In Chip Multiprocessors Using Critical Value Forwarding P.

With Extreme Scale Computing the Rules Have Changed Jack Dongarra University of Tennessee Oak

Apache Spark Corso di Sistemi e Architetture per Big Data A.A. 2019/2020 Valeria Cardellini

Peer-to-Peer Networks 18 Hole Punching Christian Schindelhauer Technical Faculty

BioSimWare: A P Systems-based Simulation Environment for Biological Systems Daniela Besozzi 1 ,

RENOVATION UPDATE Agenda Overview of Club Renovation Design of Main Club House

Two Systems &amp; Two Theories of Mind Ian A. Apperly &amp; Stephen A. Butterfill Are human

PSYC 335 Developmental Psychology I Session 3 Theories in Developmental Psychology- Part I

Part 3: Memory-Aware DAG Scheduling CR05: Data Aware Algorithms October 12 & 15, 2020

Two Systems & Two Theories of Mind Ian A. Apperly & Stephen A. Butterfill Are human