TSUBAME3 and ABCI: Supercomputer Architectures for HPC and AI / BD - PowerPoint PPT Presentation

TSUBAME3 and ABCI: Supercomputer Architectures for HPC and AI / BD Convergence Satoshi Matsuoka Professor, GSIC, Tokyo Institute of Technology / Director, AIST-Tokyo Tech. Big Data Open Innovation Lab / Fellow, Artificial Intelligence Research Center, AIST, Japan / Vis. Researcher, Advanced Institute for Computational Science, Riken GTC2017 Presentation 2017/05/09

Tremendous Recent Rise in Interest by the Japanese Government on Big Data, DL, AI, and IoT • Three national centers on Big Data and AI launched by three competing Ministries for FY 2016 (Apr 2015-) – METI – AIRC (Artificial Intelligence Research Center): AIST (AIST internal budget + > $200 million FY 2017), April 2015 • Broad AI/BD/IoT, industry focus – MEXT – AIP (Artificial Intelligence Platform): Riken and other institutions ($~50 mil), April 2016 Vice Minsiter • A separate Post-K related AI funding as well. Tsuchiya@MEXT Annoucing AIP • Narrowly focused on DNN estabishment – MOST – Universal Communication Lab: NICT ($50~55 mil) • Brain –related AI – $1 billion commitment on inter-ministry AI research over 10 years 2

2015- AI Research Center (AIRC), AIST Now > 400+ FTEs Effective Cycles among Research and Deployment of AI Deployment of AI in real businesses and society Big Sciences Security Manufacturing Institutions Innovative Health Care Network Services Industrial robots Bio-Medical Sciences Start-Ups Companies Retailing Elderly Care Material Sciences Communication Automobile Director: Standard Tasks Application Domains Technology transfer Technology transfer Jun-ichi Tsujii Common AI Platform Joint research Standard Data Starting Enterprises Common Modules Planning/Business Team Common Data/Models Planning/Business Team NLP, NLU Behavior Prediction Planning Image Recognition AI Research Framework ･･･ Text mining Mining & Modeling Recommend Control 3D Object recognition Matsuoka : Joint Data-Knowledge integration AI Brain Inspired AI appointment as ･･･ Model of Model of Model of Basal ganglia Ontology Bayesian net ･･･ “Designated” Fellow Hippocampus Cerebral cortex Logic & Probabilistic Knowledge Modeling since July 2017 Core Center of AI for Industry-Academia Co-operation

National Institute for Joint Lab established Feb. Tokyo Institute of Advanced Industrial Science 2017 to pursue BD/AI joint Technology / GSIC and Technology (AIST) research using large-scale HPC BD/AI infrastructure Resources and Acceleration of Ministry of Economics Tsubame 3.0/2.5 AI / Big Data, systems research Trade and Industry (METI) Big Data /AI resources AIST Artificial Intelligence ITCS Research Center Joint Departments (AIRC) Research on Director: Satoshi Matsuoka AI / Big Data Application Area Basic Research and Natural Langauge Industrial in Big Data / AI applications Processing Collaboration in data, algorithms and Other Big Data / AI Robotics applications methodologies research organizations Security Industry and proposals JST BigData CREST ABCI JST AI CREST AI Bridging Cloud Etc. Infrastructure

Characteristics of Big Data and AI Computing As BD / AI As BD / AI Dense LA: DNN Graph Analytics e.g. Social Networks Inference, Training, Generation Sort, Hash, e.g. DB, log analysis Symbolic Processing: Traditional AI Opposite ends of HPC computing spectrum, but HPC simulation As HPC T ask apps can also be As HPC T ask Dense Matrices, Reduced Precision categorized likewise Integer Ops & Sparse Matrices Dense and well organized neworks Data Movement, Large Memory and Data Sparse and Random Data, Low Locality Acceleration via Acceleration, Scaling Acceleration, Scaling Supercomputers adapted to AI/BD

(Big Data) BYTES capabilities, in bandwidth and capacity , unilaterally important but often missing from modern HPC machines in their pursuit of FLOPS… • Need BOTH bandwidth and capacity Our measurement on breakdown of one iteration (BYTES) in a HPC-BD/AI machine: of CaffeNet training on • Obvious for lefthand sparse ,bandwidth- TSUBAME-KFC/DL (Mini-batch size of 256) dominated apps • But also for righthand DNN: Strong scaling, Proper arch. to large networks and datasets, in particular support large for future 3D dataset analysis such as CT- memory cap. Computation on GPUs scans, seismic simu. vs. analysis…) occupies only 3.9% and BW , network latency and BW important (Source: http://www.dgi.com/images/cvmain_overview/CV4DOverview_Model_001.jpg) Number of nodes (Source: https://www.spineuniverse.com/image- library/anterior-3d-ct-scan-progressive-kyphoscoliosis)

Th The c e current s status of of AI AI & Big D Data a in J Jap apan We e need need the t e triag age o e of advanced ced algorithm hms/infrastruc ucture/da data but w t we lac ack k the he cutting ng e edge i infrastruc uctur ure dedi dedicated ed to AI AI & Bi Big D Data (c.f. H HPC) C) AI Venture Startups R& R&D M ML Big Companies AI/BD Joint RWBC R&D (also Science) Algor orithms AIST-AIRC Open Innov. Seeking Innovative Lab (OIL) & SW SW (Director: Matsuoka) Application of AI & AI/BD Centers & Riken Data Labs in National NICT- -AIP Labs & Universities UCRI Use of Massive Scale Data now Massive Rise in Computing Over $1B Govt. Wasted Requirements (1 AI-PF/person?) AI investment Petabytes of Drive Recording Video over 10 years FA&Robots B In HPC , Cloud continues to Web access and be insufficient for cutting merchandice edge research => AI& I&Data Massive “Big” Data in IoT Communication, “Big ig”Da ”Data ta dedicated SCs dominate & Infrast structures location & other data Training racing to Exascale

2017 2017 Q2 Q2 TSUBAM TSU SUBA BAME AME3.0 Leading M ME3.0 g Mach chine T Towards Exa xa & B Big Data 1.“Everybody’s Supercomputer” - High Performance (12~24 DP Petaflops, 125~325TB/s Mem, 55~185Tbit/s NW), innovative high cost/performance packaging & design, in mere 180m 2 … 2.“Extreme Green” – ~10GFlops/W power-efficient architecture, system-wide power control, advanced cooling, future energy reservoir load leveling & energy recovery 3.“Big Data Convergence” – BYTES-Centric Architecture, Extreme high BW & capacity, deep memory 2013 hierarchy, extreme I/O acceleration, Big Data SW Stack TSUBAME2.5 for machine learning, graph processing, … upgrade 5.7PF DFP 2017 TSUBAME3.0+2.5 4.“Cloud SC” – dynamic deployment, container-based /17.1PF SFP ~18PF(DFP) 4~5PB/s Mem BW node co-location & dynamic configuration, resource 20% power 10GFlops/W power efficiency reduction elasticity, assimilation of public clouds… Big Data & Cloud Convergence 5.“Transparency” - full monitoring & user visibility of machine & job state, 2010 TSUBAME2.0 accountability 2.4 Petaflops #4 World “Greenest Production SC” via reproducibility Large Scale Simulation 2006 TSUBAME1.0 2013 TSUBAME-KFC Big Data Analytics 80 Teraflops, #1 Asia #7 World 8 #1 Green 500 Industrial Apps “Everybody’s Supercomputer” 2011 ACM Gordon Bell Prize

TSUBAME-KFC/DL: TSUBAME3 Prototype [ICPADS2014] Oil Immersive Cooling ＋ Hot Water Cooling + High Density Packaging + Fine- Grained Power Monitoring and Control, upgrade to /DL Oct. 2015 High Temperature Cooling Cooling Tower ： Oil Loop 35~45 ℃ Water 25~35 ℃ ⇒ Water Loop 25~35 ℃ ⇒ To Ambient Air (c.f. TSUBAME2: 7~17 ℃ ) Single Rack High Density Oil 2013 年 11 月 /2014 年 6 月 Immersion Word #1 Green500 168 NVIDIA K80 GPUs + Xeon 413+TFlops (DFP) Container Facility 1.5PFlops (SFP) 20 feet container (16m 2 ) ~60KW/rack Fully Unmanned Operation

Overview of TSUBAME3.0 BYTES-centric Architecture, Scalaibility to all 2160 GPUs, all nodes, the entire memory hiearchy Full Operations Aug. 2017 Full Bisection Bandwidgh Intel Omni-Path Interconnect. 4 ports/node Full Bisection / 432 Terabits/s bidirectional ~x2 BW of entire Internet backbone traffic DDN Storage (Lustre FS 15.9PB+Home 45TB) 540 Compute Nodes SGI ICE XA + New Blade Intel Xeon CPU x 2+NVIDIA Pascal GPUx4 (NV-Link) 256GB memory 2TB Intel NVMe SSD 47.2 AI-Petaflops, 12.1 Petaflops

TSUBAME3: A Massively BYTES Centric Architecture for Converged BD/AI and HPC Intra-node GPU via NVLink Terabit class network/node Intra-node GPU via NVLink 20~40GB/s 800Gbps (400+400) 20~40GB/s full bisection HBM2 Inter-node GPU via OmniPath 64GB 12.5GB/s fully switched 2.5TB/s Any “Big” Data in the DDR4 system can be moved 256GB to anywhere via 150GB/s RDMA speeds minimum Intel Optane 12.5GBytes/s 1.5TB 12GB/s also with Stream 16GB/s PCIe 16GB/s PCIe (planned) Processing Fully Switched Fully Switched NVMe Flash Scalable to all 2160 2TB 3GB/s GPUs, not just 8 11 ~4 Terabytes/node Hierarchical Memory for Big Data / AI (c.f. K-compuer 16GB/node)  Over 2 Petabytes in TSUBAME3, Can be moved at 54 Terabyte/s or 1.7 Zetabytes / year

TSUBAME3 and ABCI: Supercomputer Architectures for HPC and AI / BD - PowerPoint PPT Presentation

TSUBAME3 and ABCI: Supercomputer Architectures for HPC and AI / BD Convergence Satoshi Matsuoka Professor, GSIC, Tokyo Institute of Technology / Director, AIST-Tokyo Tech. Big Data Open Innovation Lab / Fellow, Artificial Intelligence Research

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Present and Future Present and Future Supercomputer Architectures Supercomputer Architectures

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

building software with ease kenneth.hoste@ugent.be HPC UGENT About HPC UGent: central

Drawings Is Issuance to House-Owners ABCi QP Dialogue Thursday 07/11/2019 There are 2 parts

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Architectures Architectural styles Software architectures Architectures versus middleware

System X Virgina Tech's Supercomputer The fastest academic supercomputer Project #2

HPC Architectures Types of resource currently in use Outline Shared memory architectures

HPC Architectures Types of resource currently in use Outline Shared memory architectures

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC

Everything you ever wanted to know about Drupal 8* but were too afraid to ask *conditions apply

T RA N SI T I M PRO V EM EN T & PED EST RI A N REA LM PRO JECT ST REET SCA PE CO M M U N I T

Varsha Chaugai under the supervision of Dr. Andy Adler, Dr. Adrian D.C. Chan, Timothy Zakutney

Cardiovascular Considerations during Bone Marrow Transplantation Daniel J Lenihan, MD Professor,

Company Presentation August 2009 Who is Endesa Chile? Multinational private electricity

Adani Transmission Limited Equity Presentation MAY 2020 CONTENTS Investment Strategy, ATL

Information Operations Immunity Style www.immunityinc.com Agenda Scenario Problems of

Vadym Kaptur, Iryna Politova Reasons of Research Methodical difficulties in development and

Sambuz

Useful Links

Newsletter

Mail Us

TSUBAME3 and ABCI: Supercomputer Architectures for HPC and AI / BD - PowerPoint PPT Presentation

TSUBAME3 and ABCI: Supercomputer Architectures for HPC and AI / BD Convergence Satoshi Matsuoka Professor, GSIC, Tokyo Institute of Technology / Director, AIST-Tokyo Tech. Big Data Open Innovation Lab / Fellow, Artificial Intelligence Research

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Present and Future Present and Future Supercomputer Architectures Supercomputer Architectures

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

building software with ease kenneth.hoste@ugent.be HPC UGENT About HPC UGent: central

Drawings Is Issuance to House-Owners ABCi QP Dialogue Thursday 07/11/2019 There are 2 parts

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Architectures Architectural styles Software architectures Architectures versus middleware

System X Virgina Tech's Supercomputer The fastest academic supercomputer Project #2

HPC Architectures Types of resource currently in use Outline Shared memory architectures

HPC Architectures Types of resource currently in use Outline Shared memory architectures

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC

Everything you ever wanted to know about Drupal 8* but were too afraid to ask *conditions apply

T RA N SI T I M PRO V EM EN T &amp; PED EST RI A N REA LM PRO JECT ST REET SCA PE CO M M U N I T

Varsha Chaugai under the supervision of Dr. Andy Adler, Dr. Adrian D.C. Chan, Timothy Zakutney

Cardiovascular Considerations during Bone Marrow Transplantation Daniel J Lenihan, MD Professor,

Company Presentation August 2009 Who is Endesa Chile? Multinational private electricity

Adani Transmission Limited Equity Presentation MAY 2020 CONTENTS Investment Strategy, ATL

Information Operations Immunity Style www.immunityinc.com Agenda Scenario Problems of

Vadym Kaptur, Iryna Politova Reasons of Research Methodical difficulties in development and

Sambuz

Useful Links

Newsletter

Mail Us

T RA N SI T I M PRO V EM EN T & PED EST RI A N REA LM PRO JECT ST REET SCA PE CO M M U N I T