for Convergence of Extreme Computing and Big Data Satoshi Matsuoka - PowerPoint PPT Presentation

TSUBAME2.0, 2.5 towards 3.0 for Convergence of Extreme Computing and Big Data Satoshi Matsuoka Professor Global Scientific Information and Computing (GSIC) Center Tokyo Institute of Technology Fellow, Association for Computing Machinery (ACM) HP-CAST SC2014 Presentation New Orleans, USA 20141114

TSUBAME2.0 Nov. 1, 2010 “The Greenest Production Supercomputer in the World” • GPU-centric (> 4000) high performance & low power • Small footprint (~200m2 or 2000 sq.ft), low TCO • High bandwidth memory, optical network, SSD storage… TSUBAME 2.0 New Development >600TB/s Mem BW 220Tbps NW Bisecion BW >1.6TB/s Mem BW >12TB/s Mem BW >400GB/s Mem BW 1.4MW Max 35KW Max 80Gbps NW BW 40nm 32nm ~1KW max 2 2

TSUBAME2.0 ⇒ 2.5 Thin Node Upgrade (Fall 1993) Peak Perf. Thin 4.08 Tflops Productized Node ~800GB/s as HP Mem BW ProLiant Infiniband QDR 80GBps NW SL390s x2 (80Gbps) ~1KW max Modified for TSUABME2.5 HP S HP SL390G7 L390G7 (De (Developed f eloped for or TSUB TS UBAM AME 2.0, Modified f E 2.0, Modified for or 2.5) 2.5) GPU: NVIDIA Kepler K20X x 3 1310GFlops, 6GByte Mem Mem(per GPU) CPU: Intel Westmere-EP 2.93GHz x2 Multi I/O chips, 72 PCI-e (16 x 4 + 4 x 2) lanes --- 3GPUs + 2 IB QDR Memory: 54, 96 GB DDR3-1333 SSD ： 60GBx2, 120GBx2 NVIDIA Kepler NVIDIA Fermi K20X M2050 3950/1310 1039/515 GFlops GFlops

Phase-field simulation for Dendritic Solidification [Shimokawabe, Aoki et. al.] Gordon Bell 2011 Winner Weak scaling on TSUBAME (Single precision) Mesh size （ 1GPU+4 CPU cores ） :4096 x 162 x 130 TSUBAME 2.5 3.444 PFlops (3,968 GPUs+15,872 CPU cores) 4,096 x 5,022 x 16,640 TSUBAME 2.0 Developing lightweight strengthening 2.000 PFlops material by controlling microstructure (4,000 GPUs+16,000 CPU cores) Low-carbon society 4,096 x 6,480 x 13,000 • Peta-Scale phase-field simulations can simulate the multiple dendritic growth during solidification required for the evaluation of new materials. • 2011 ACM Gordon Bell Prize Special Achievements in Scalability and Time-to-Solution

Application TSUBAME2.0 TSUBAME2.5 Boost Performance Performance Ratio Top500/Linpack 1.192 2.843 2.39 4131 GPUs (PFlops) Green500/Linpack 0.958 3.068 3.20 4131 GPUs (GFlops/W) Semi-Definite Programming Nonlinear 1.019 1.713 1.68 Optimization 4080 GPUs (PFlops) Gordon Bell Dendrite Stencil 3968 GPUs 2.000 3.444 1.72 (PFlops) LBM LES Whole City Airflow 3968 GPUs 0.592 1.142 1.93 (PFlops) Amber 12 pmemd 3.44 11.39 3.31 4 nodes 8 GPUs (nsec/day) GHOSTM Genome Homology Search 1 19361 10785 1.80 GPU (Sec) MEGADOC Protein Docking 37.11 83.49 2.25 1 node 3GPUs (vs. 1CPU core)

TSUBAME2.0=>2.5 Power Improvement 2012/12 18% Power Reduction inc. cooling 2013/11 Green 500 #6 in the world 2013/12 • Along with TSUBAME-KFC (#1) • 2014/6 #9

Com ompa paring g K K Com ompu puter r to o TS TSUB UBAM AME 2. 2.5 Pe Perf rf ≒ Cos ost << << K Computer (2011) TS TSUB UBAM AME2.0 .0(2010) → TS TSUB UBAM AME2.5(2013) 11.4 Petaflops SFP/DFP 17.1 Petaflops SFP $1400mil 6 years 5.76 Petaflops DFP (incl. power) $45mil / 6 years (incl. power) x30 TSUBAME2

TSUBAME2 vs. K Technological Comparisons (TSUBAME2 Deploying State-of-Art Tech.) TSUBAME2.5 BG/Q Sequoia K Computer Single Precision FP 17.1 Petaflops 20.1 Petaflops 11.3 Petaflops 3,068.71 (6 th ) 2,176.58 (26 th ) 830.18 (123 rd ) Green500 (MFLOPS/W) Nov. 2013 Operational Power (incl. Cooling) ~0.8MW 5~6MW? 10~11MW Hardware Architecture Many-Core (GPU) + Multi-Core Homo Multi-Core Homo Multi-Core Hetero Maximum HW Threads > 1 Billion ~6 million ~700,000 Memory Technology GDDR5+DDR3 DDR3 DDR3 Network Technology Luxtera Silicon Standard Optics Copper Photonics Non Volatile Memory / SSD SSD Flash all nodes None None ~250TBytes Power Management Node/System Active Rack-level Rack-level measurement only measurement only Power Cap Virtualization KVM (G & V queues, None None Resource segragation)

TSUBAME3.0 ： Leadership “Template” Machine • Under Design ： Deployment 2016H2~H3 • High computational power: ~20 Petaflops, ~5 Petabyte/s Mem BW • Ultra high density: ~0.6 Petaflops DFP/rack (x10 TSUBAME2.0) • Ultra power efficient: 10 Gigaflops/W (x10 TSUBAME2.0, TSUBAME-KFC) – Latest power control, efficient liquid cooling, energy recovery • Ultra high-bandwidth network: over 1 Petabit/s bisection, new topology? – Bigger capacity than the entire global Intenet (several 100Tbps) • Deep memory hierarchy and ultra high-bandwidth I/O with NVM – Petabytes of NVM, several Terabytes/s BW, several 100 million IOPS – Next generation “scientific big data” support • Advanced power aware resource management, high resiliency SW/HW co- design, VM & container- based dynamic deployment…

Focused Research Towards Tsubame 3.0 and Beyond towards Exa • Software and Algorithms for new memory hiearchy – Pushing the envelops of low Power vs. Capacity, Communication and Synchroniation Reducing Algorithms (CSRA) • Post Petascale Networks – Topology, Routing Algorithms, Placement Algorithms… (SC14 paper Tue 14:00- 14:30 “Fail in Place Network…”) • Green Computing : Power aware APIs, fine-grained resource scheduling • Scientific “Extreme” Big Data – GPU Hadoop Acceleration, Large Graphs, Search/Sort, Deep Learning • Fault Tolerance – Group-based Hierarchical Checkpointing, Fault Prediction, Hybrid Algorithms • Post Petascale Programming – OpenACC extensions and other many-core programming substrates, • Performance Analysis and Modeling – For CSRA algorithms, for Big Data, for deep memory hierarchy, for fault tolerance, …

TSUBAME KFC TSUBAME-KFC Towards TSUBAME3.0 and Beyond Oil-Immersive Cooling #1 Green 500 SC13, ISC14, … (Paper @ ICPADS14)

Extreme Big Data Examples Rates and Volumes are extremely immense Social NW – large graph processing Social Simulation NOT simply mining • • Facebook Applications – 〜 1 billion users Tbytes Silo Data – Target Area: Planet – (Open Street Map) Average 130 friends – 7 billion people – 30 billion pieces of content Peta~Zetabytes Data • shared per month Input Data • Twitter – Road Network for Planet: 300GB (XML) – 500 million active users Ultra High-BW Data – Trip data for 7 billion people – 340 million tweets per day 10KB (1trip) x 7 billion = 70TB Stream • Internet – Real-Time Streaming Data – 300 million new websites per year (e.g., Social sensor, physical data) – 48 hours of video to YouTube per minute • Highly Unstructured, Simulated Output for 1 Iteration Weather – real time – 30,000 YouTube videos played per second – 700TB Genomics advanced Irregular large data assimilation sequence matching Complex correlations Impact of new generation sequencers� � Phased Array Radar Himawari between data from 1GB/30sec/2 radars 500MB/2.5min multiple sources A-1. Quality Control B-1. Quality Control A-2. Data Processing B-2. Data Processing ① 30-sec Ensemble ② Ensemble Extreme Capacity, Analysis Data Forecast Simulations Data Assimilation 2GB 2 PFLOP 2 PFLOP Bandwidth, Compute ③ 30-min Ensemble Forecasts Ensemble Analyses シミュレーションシミュレーション Forecast Simulation All Required シミュレーションシミュレーション 200GB 200GB データデータ 1.2 PFLOP データデータ Sequencing� data� (bp)/$� � becomes� x4000� per� 5� years� Repeat every 30 sec. years � � c.f.,� HPC� x33� in� 5� 30-min Forecast 4� � 2GB Lincoln Stein, Genome Biology, vol. 11(5), 2010

Graph500 “Big Data” Benchmark Kronecke ecker graph h BSP Problem blem November 15, 2010 Graph 500 Takes Aim at a New Kind of HPC Richard Murphy (Sandia NL => Micron) “ I expect that this ranking may at times look very different from the TOP500 list. Cloud architectures will A: 0.57, B: 0.19 almost certainly dominate a major chunk of part of the C: 0.19, D: 0.05 list .” The 8 th Graph500 List (June2014): K Computer #1, TSUBAME2 #12 Koji Ueno, Tokyo Institute of Technology/RIKEN AICS RIKEN Advanced Institute for Computational ) ’s K computer S cience (AICS is ranked Reality: Top500 Supercomputers Dominate No.1 on the G raph500 Ranking of S upe rcompute rs with 17977.1 GE/ s on Scale 40 on the 8th G raph500 lis t publis he d at the Inte rnational S upercomputing Conference, June 22, 2014. No Cloud IDCs at all Congratulations from the G raph500 Executive Committee #1 K Computer G lobal Scienti fic Information and Computing Center, Tokyo Institute of Technology ’s TSUBAME 2.5 is ranked No.12 on the G raph500 Ranking of S upercomputers with 1280.43 GE/ s on Scale 36 on the 8th G raph500 list published at the International S upercomputing Conference, June 22, 2014. Congratulations from the G raph500 Executive Committee #12 TSUBAME2

for Convergence of Extreme Computing and Big Data Satoshi Matsuoka - PowerPoint PPT Presentation

TSUBAME2.0, 2.5 towards 3.0 for Convergence of Extreme Computing and Big Data Satoshi Matsuoka Professor Global Scientific Information and Computing (GSIC) Center Tokyo Institute of Technology Fellow, Association for Computing Machinery (ACM)

Extreme Heat Preparedness Objectives What is extreme heat ? How does it impact SF? What are the

2014: Extreme territories 2 2015: Extreme territories 3 2016: Extreme territories 4 2018:

MATHEMATICS 1 CONTENTS Extreme values in one dimension Extreme values in two dimensions

The JEM-EUSO Mission to Explore the The JEM-EUSO Mission to Explore the Extreme Universe Extreme

Extreme value theory QUAN TITATIVE RIS K MAN AGEMEN T IN P YTH ON Jamsheed Shorish

Community Resilience to Extreme Events 15 th April 2019 University of Stirling Extreme Events

Low rank SDP extreme points and Applications Mohit Singh Georgia Tech SDP extreme points

Extreme Value Theory in Risk Management See McNeil, Extreme Value Theory for Risk Managers Risk

Lecture 12: Extreme Value Theory Applied Statistics 2015 1 / 18 A real problem Extreme Value

Common Subexpression Convergence (CSC) Sana Damani and Vivek Sarkar Habanero Extreme Scale

Multi- -Disciplinary Convergence in Life Sciences: Disciplinary Convergence in Life Sciences:

OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG Convergence in Chemistry and Biology

Asymptotics Review Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline Types

II of large Number Lattin in probability almost convergence convergence sure - - "

NS NSF Convergence Accelerator Chaitan Baru Senior Science Advisor, Convergence Accelerator

CS 557 BGP Convergence Improved BGP Convergence via Ghost Flushing Bremler-Barr, Afek, Schwarz,

Railway Traction and Power System Energy Optimisation Ning Zhao Birmingham Centre for Railway

CITI-SENSE Project Development of sensor-based Citizens Observatory Community for improving

Whitepark Choosen route . Tie Ingredients . Using Formulas . Authoritative . Quirky For the

HERITAGE OIL LIMITED FORWARD LOOKING INFORMATION The information contained in this presentation

Will be modified map from GIS 6/16 EOD Beltrami Island Sub-Watershed Beltrami Island

Proposal to Establish a New ND Primary School and Implement Catchment Changes to Address School

Technical Assistance 2015 Presentation to: KWWOA April 15, 2015 Department for

The Delaw are River Watershed Initiative (DRWI): More To Come In Phase 2 Carol R. Collier,

Sambuz

Useful Links

Newsletter

Mail Us

for Convergence of Extreme Computing and Big Data Satoshi Matsuoka - PowerPoint PPT Presentation

TSUBAME2.0, 2.5 towards 3.0 for Convergence of Extreme Computing and Big Data Satoshi Matsuoka Professor Global Scientific Information and Computing (GSIC) Center Tokyo Institute of Technology Fellow, Association for Computing Machinery (ACM)

Extreme Heat Preparedness Objectives What is extreme heat ? How does it impact SF? What are the

2014: Extreme territories 2 2015: Extreme territories 3 2016: Extreme territories 4 2018:

MATHEMATICS 1 CONTENTS Extreme values in one dimension Extreme values in two dimensions

The JEM-EUSO Mission to Explore the The JEM-EUSO Mission to Explore the Extreme Universe Extreme

Extreme value theory QUAN TITATIVE RIS K MAN AGEMEN T IN P YTH ON Jamsheed Shorish

Community Resilience to Extreme Events 15 th April 2019 University of Stirling Extreme Events

Low rank SDP extreme points and Applications Mohit Singh Georgia Tech SDP extreme points

Extreme Value Theory in Risk Management See McNeil, Extreme Value Theory for Risk Managers Risk

Lecture 12: Extreme Value Theory Applied Statistics 2015 1 / 18 A real problem Extreme Value

Common Subexpression Convergence (CSC) Sana Damani and Vivek Sarkar Habanero Extreme Scale

Multi- -Disciplinary Convergence in Life Sciences: Disciplinary Convergence in Life Sciences:

OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG Convergence in Chemistry and Biology

Asymptotics Review Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline Types

II of large Number Lattin in probability almost convergence convergence sure - - &quot;

NS NSF Convergence Accelerator Chaitan Baru Senior Science Advisor, Convergence Accelerator

CS 557 BGP Convergence Improved BGP Convergence via Ghost Flushing Bremler-Barr, Afek, Schwarz,

Railway Traction and Power System Energy Optimisation Ning Zhao Birmingham Centre for Railway

CITI-SENSE Project Development of sensor-based Citizens Observatory Community for improving

Whitepark Choosen route . Tie Ingredients . Using Formulas . Authoritative . Quirky For the

HERITAGE OIL LIMITED FORWARD LOOKING INFORMATION The information contained in this presentation

Will be modified map from GIS 6/16 EOD Beltrami Island Sub-Watershed Beltrami Island

Proposal to Establish a New ND Primary School and Implement Catchment Changes to Address School

Technical Assistance 2015 Presentation to: KWWOA April 15, 2015 Department for

The Delaw are River Watershed Initiative (DRWI): More To Come In Phase 2 Carol R. Collier,

Sambuz

Useful Links

Newsletter

Mail Us

II of large Number Lattin in probability almost convergence convergence sure - - "