: Welcome the Second Spring of Dataflow and Parallel Computing -- - - PowerPoint PPT Presentation

welcome the second spring of dataflow and parallel
SMART_READER_LITE
LIVE PREVIEW

: Welcome the Second Spring of Dataflow and Parallel Computing -- - - PowerPoint PPT Presentation

: Welcome the Second Spring of Dataflow and Parallel Computing -- Toward a Path of Convergence for Ecosystems of Extreme-Scale HPC, Big Data and Beyond Guang R. Gao ACM Fellow and IEEE Fellow Endowed Distinguished Professor, University of


slide-1
SLIDE 1

:

Welcome the Second Spring of Dataflow and Parallel Computing --

Toward a Path of Convergence for Ecosystems of

Extreme-Scale HPC, Big Data and Beyond Guang R. Gao

ACM Fellow and IEEE Fellow Endowed Distinguished Professor, University of Delaware And Founder of ETI

A&M 05-16-2016 1

slide-2
SLIDE 2

Outline

  • Introduction
  • Second Spring of HPC Parallel Computing
  • New Challenges:HPC vs. Big Data –

Divergence or Convergence ?

  • The Codelet Model and SWARM
  • Challenges/Opportunities: HPC + Big Data
  • Summary Remarks

A&M 05-16-2016 2

slide-3
SLIDE 3

Looking Back 20+ Years

The Pessimism over our field..

  • HPC is a small and relatively unimportant

field ?

  • Is Parallel Computing dead – Ken Kennedy?
  • Computer architecture is a dead field ?
  • Full Artificial Intelligence is a “fantasy” ?
  • Dataflow model of computation suffered great

setback ….

A&M 05-16-2016 3

slide-4
SLIDE 4

Looking Back 20+ Years ..

  • “Parallel Computing is dead”
  • “Death of computer architecture”
  • “Death of dataflow model of computation”
  • “Death of Artificial Intelligence!”
  • ….

AIST-03-01-2016 演讲 4

slide-5
SLIDE 5

IPDPS2005-Keynote 5

State of Parallel Computer Architecture Innovations

– “…researchers basked in parallel-computing

  • glory. They developed an amazing variety of

parallel algorithms for every applicable sequential

  • peration. They proposed every possible

structure to interconnect thousands of processors…”

– But “.. The market for massively parallel computers

has collapsed, and many companies have gone

  • ut of business.

[IEEE Computer, Nov. 1994, pp 74-75]

slide-6
SLIDE 6

IPDPS2005-Keynote 6

State of Parallel Computer Architecture Innovations

  • “ ..The term 'proprietary architecture'

has become pejorative. For computer designers, the revolution is over and

  • nly 'fine tuning' remains… “

[“End of Architecture”, Burton Smith 1990s]

slide-7
SLIDE 7

5/25/2016

A&M 05-16-2016 7

Corporations Vanishing

(1985 – 2005)

1990 1992 1994 1996 1985 2000 1998 2005 1999 Sequent 1994 Thinking Machines 1992 Meiko Scientific 1995 Pyramid 1998 DEC 1989 ETA MasPar 1996 Convex Computer 1994 nCube 2005 Kendall Square Resarch 1996 ESCD 1990 Multiflow 1990 Cray Research 1996 BBN 1997 Myrias 1991

Keynote at the 2005 IPDPS Conference Denver, CO

slide-8
SLIDE 8

“Is Parallel Computing Dead ?”

  • Ken Kennedy, 1994

AIST-03-01-2016 演讲 8

“The announcement that Thinking Machines would seek Chapter 11 bankruptcy protection, although not unexpected, sent shock waves through the high- performance computing community. Coupled with the well-publicized problems of Kendall Square Research and the rumored problems of Intel Supercomputer Systems Division, this event has led many people to question the long- term viability of the parallel computing industry and even parallel computing itself. Meanwhile, the dramatic strides in the performance of scientific workstations continues to squeeze the market for parallel supercomputing. On several recent occasions, I have been asked whether parallel computing will soon be relegated to the trash heap reserved for promising technologies that never quite make it. Washington certainly seems to be looking in the other direction--agency program managers, if they talk of high-performance computing at all, seem to view it as a small and relatively unimportant subcomponent of the National Information Infrastructure.

slide-9
SLIDE 9

Outline

  • Introduction
  • Second Spring of HPC Parallel Computing
  • New Challenges: HPC vs. Big Data –

Divergence or Convergence

  • The Codelet Model and SWARM
  • Challenges/Opportunities: HPC + Big Data
  • Summary Remarks

A&M 05-16-2016 9

slide-10
SLIDE 10

山穷水尽疑无路 柳暗花明又一村

宋代诗人陆游的作品 《游山西村》

A&M 05-16-2016 10

slide-11
SLIDE 11

2005-Present A Second Spring of HPC Parallel Computing

  • Sequential processing hits serious walls

– Heat wall – Memory wall – Other walls

  • Parallel processing (appear) to provide a

powewrful alternative to beat the walls

  • Moors Law (appear) to still enjoy good years

in the past decade

  • Two examples (see next 2 slides)

A&M 05-16-2016 11

slide-12
SLIDE 12

2016/5/25

IPDPS2005-Keynote 12

slide-13
SLIDE 13

2016/5/25

IPDPS2005-Keynote 13

Communication Ports for 3D Mesh Inter-Chip Network UPC+ / - Co- array Fortran OpenMP-XN EARTH-C + / - MPI

……

Application Programming API Cyclops Thread Virtual Machine

Thread Managem ent Shared Mem ory Operations Thread Creation & Termination Scheduling Dynamic memory management Put / get with sync

acquire / release

fibers async function invocation

Kcc/ gcc Com piler Tool chain

Cyclops-64 Programming Models and System Software Supports

Cyclops-64 ISA

Fine- Grain Multithreading Thread Synchronization Load Balancing Others Put / get Location Consistency System Software Percolation

Advanced Execution/ Programming Model Infrastructure and Tools

Simulation / Emulation Analytical Modeling

Base Execution Model

Fine-Grain Multithreading (e.g. EARTH, CARE)

24x24

24 PC cards in 1 shishkebab

1 PetaFlops

A-Switch

Crossbar Network

MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK

TU TU SP SP FPU 4 GB/sec * 6 4 GB/sec 50 MB/sec 1 Gbit/s ethernet

Off-Chip Memory

Other Chips via 3D mesh

Off-Chip Memory Off-Chip Memory Off-Chip Memory

IDE HDD 4 GB/sec 6 SP SP SP SP SP SP SP SP TU TU SP SP FPU TU TU SP SP FPU TU TU SP SP FPU A-switch DMA 6

A-Switch

Crossbar Network Crossbar Network

MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK

TU TU SP SP FPU TU TU TU TU SP SP SP SP FPU FPU 4 GB/sec * 6 4 GB/sec 50 MB/sec 1 Gbit/s ethernet

Off-Chip Memory Off-Chip Memory

Other Chips via 3D mesh

Off-Chip Memory Off-Chip Memory Off-Chip Memory Off-Chip Memory Off-Chip Memory Off-Chip Memory

IDE HDD 4 GB/sec 6 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP TU TU SP SP FPU TU TU TU TU SP SP SP SP FPU FPU TU TU SP SP FPU TU TU TU TU SP SP SP SP FPU FPU TU TU SP SP FPU TU TU TU TU SP SP SP SP FPU FPU A-switch DMA A-switch DMA 6
slide-14
SLIDE 14

Outline

  • Introduction
  • The Second Spring of HPC Parallel Computing
  • HPC vs. Big Data – Divergence or

Convergence

  • The Codelet Model and SWARM
  • Challenges/Opportunities: HPC + Big Data
  • Summary Remarks

A&M 05-16-2016 14

slide-15
SLIDE 15

What is HPC

High-Performance Computing: The term "high-performance computing" refers to systems that, through a combination of processing capability and storage capacity, can solve computational problems that are beyond

the capability of small- to medium-scale systems.

[Obama’s Executive Order]

Gao-03-07-2016 MEXT 演讲 15

slide-16
SLIDE 16

What is Big Data ?

Big data is a broad term for data sets so large or complex that traditional data

processing applications are inadequate.

  • Challenges include analysis, capture, data curation,

search, sharing, storage, transfer,visualization, querying and information privacy.

  • The term often refers simply to the use of predictive

analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set.

Gao-03-07-2016 MEXT 演讲 16

slide-17
SLIDE 17

Data analytics and computing ecosystem compared

Courtesy by “Exscale Computing and Big Data”, DANIEL A. REED AND JACK DONGARRA , CACM July 2015

Mahout: machine learning tool Hive: data warehouse software Pig: provide high level language for big data Sqoop: exchange data with traditional database Flume: log management Zookeeper: maintaining consistency Storm: real-time computation system. Hbase: a distributed, scalable big data store. AVRO: data serialization system.

Data Analytic

FORTRAN,C,C++: languages PAPI: performance and debugging tool MPI/OpenMP: multi-core parallel model SLURM: batch scheduler Lustre: parallel file system

Computational Science

NOTE: The Divergence of Big Data and HPC Eco-Systems!

Waseda-01-26-2016 演讲 17

slide-18
SLIDE 18

Key Insights

  • The tools and cultures of high-performance computing and big data

analytics have diverged, to the detriment of both; unification

is essential to address a spectrum of major research domains

  • The challenges of scale tax our ability to transmit data, compute

complicated functions on that data, or store a substantial part of it;

new approaches are required to meet these challenges

  • The international nature of science demands further development of

advanced computer architectures and global standards for processing data, even as international competition complicates the openness of the scientific process

Courtesy by “Exscale Computing and Big Data”, DANIEL A. REED AND JACK DONGARRA , CACM July 2015

Waseda-01-26-2016 演讲 18

slide-19
SLIDE 19

Outline

  • Introduction
  • Second Spring of HPC Parallel Computing
  • New Challenges: HPC vs. Big Data –

Divergence or Convergence

  • The Codelet Model and SWARM
  • Challenges/Opportunities: HPC + Big Data
  • Summary Remarks

A&M 05-16-2016 19

slide-20
SLIDE 20

A Quiz: Have you heard the following terms ?

Actors (dataflow) ?

A&M 05-16-2016 20

strand ?

fiber ?

codelet ?

slide-21
SLIDE 21

21

Coarse-Grain vs. Fine-Grain Multithreading

CPU Memory

Fine-Grain non-preemptive thread- The “hotel” model

Thread Unit A Thread Pool CPU Memory Executor Locus A Single Thread

Coarse-Grain thread- The family home model

Thread Unit A&M 05-16-2016

slide-22
SLIDE 22

What Is A Codelet ?

  • Intuitively:

A unit of computation which interacts with the global state only at its entrance and exit points

  • Terminology

I do not like to use the term “functional” here – which usually means “stateless”!

22 A&M 05-16-2016

slide-23
SLIDE 23

Operational Semantics of Codelets Enabling/Firing Rules

Consider a Codelet graph G – with an assignment of events on some of its edges:

  • A codelet is enabled if

– An event is present on each of its input edges; – none of the output edges may have any events.

  • An enabled event can be scheduled for execution

(i.e. fired). The firing of a Codelet will remove all input events (one from each input), and will produce output events, one on each output.

23 A&M 05-16-2016

slide-24
SLIDE 24

The Codelet: A Fine-Grain Piece of Computing

Codelet

Result Object Data Objects

A&M 05-16-2016 24

5/25/2016

  • A Codelet is fired when all its inputs are available.
  • Inputs can be data or resource conditions.
  • Fundamental properties of Data-Flow: Determinacy,

Repeatability, Composability, among others.

This Looks Like Data Flow !

  • Jack Dennis
slide-25
SLIDE 25

Evolution of Multithreaded Execution and Architecture Models

Non-dataflow based

CDC 6600 1964 MASA

Halstead 1986

HEP

  • B. Smith

1978

Cosmic Cube

Seiltz 1985

J-Machine

Dally 1988-93

M-Machine

Dally 1994-98

Dataflow model inspired

MIT TTDA

Arvind 1980

Manchester

Gurd & Watson 1982

*T/Start-NG

MIT/Motorola 1991-

SIGMA-I

Shimada 1988

Monsoon

Papadopoulos & Culler 1988

P-RISC

Nikhil & Arvind 1989

EM-5/4/X

RWC-1 1992-97 Iannuci’s 1988-92

Others: Multiscalar (1994), SMT (1995), etc.

Flynn’s Processor

1969

CHoPP’77 CHoPP’87

TAM

Culler 1990

Tera

  • B. Smith

1990-

Alwife

Agarwal 1989-96

Cilk

Leiserson

LAU

Syre 1976

Eldorado CASCADE Static Dataflow

Dennis 1972 MIT

Arg-Fetching Dataflow

DennisGao 1987-88

MDFA

Gao 1989-93

EARTH

Hum et al. 1993-2006

HTVM/TNT-X

DelCuvillo and Gao 2000-2010

Codelet Model

Gao et. al. 2009-

An early version of this slide was presented in my invited talk at Turing Award Winner Fran Allen’s Retirement Party 2002

A&M 05-16-2016 25

slide-26
SLIDE 26

DataFlow = Data + Flow

What is Dataflow Model ?

26 A&M 05-16-2016

slide-27
SLIDE 27

DataFlow = Data + Flow

A Miss Understanding of Dataflow Model

27 A&M 05-16-2016

slide-28
SLIDE 28

Dataflow Model of Computation (pioneered by J.B. Dennis, Early 1970s)

  • A data-driven program execution model

(PXM) where a program unit is enabled for execution upon the availability of its input data at runtime.

  • It has long been viewed as a radical (颠覆

性的)departure from the classical von Neumann computation model (often referred as a control-driven or control- flow PXM).

A&M 05-16-2016 28

slide-29
SLIDE 29

Inspiration: Jack Dennis

General purpose parallel machines based on a dataflow graph model of computation 2013年,因其在操作系统和 数据流领域的重大贡献 荣获IEEE John von Neumann Medal

29 A&M 05-16-2016

slide-30
SLIDE 30

Dataflow Model Superioity --

Breaking “两墙一锁”

  • Breaking the serialization barrier to

parallelism exploitation due to the von- Neumann computation model [Arvind 1982]

  • Breaking the von Neumann (CPU-Memory )

bottle-neck: provide a tight/smooth coupling of the processing and data [Backus 1977 ACM Turing Award]

  • Unlocking the shackle of traditional OS and VM

[SPARK latest news Deep Dive Into Databricks’ Big

Speedup Plans for Apache Spark, May, 2015]

30 A&M 05-16-2016

slide-31
SLIDE 31

First Second Third Finally

A&M 05-16-2016 31

slide-32
SLIDE 32

What is SWARM?

A&M 05-16-2016 32

slide-33
SLIDE 33

What Is SWARM (cont’d) ?

A&M 05-16-2016 33

slide-34
SLIDE 34

Execution Model API SWift Abstract Runtime Machine

SWARM

Programming Environment Platforms

Users Users

SWARM Execution Model

Programming Models

High-Level Programming API (MPI, Open MP, CnC, Xio, Chapel, etc.) Software packages Program libraries Utility applications Compilers Tools/SDK

Exascale Hardware Architecture

SWARM Runtime

Language Runtime A&M 05-16-2016 34

slide-35
SLIDE 35

What is SWARM?

  • SWARM = SWift Adaptive Runtime Machine
  • SWARM -- a commercialization of an An

Abstract Codelet Machine (ACM)

  • SWARM is developed and marketed by ETI – a

small business based on Delaware

  • SWARM is available for academia under special

license and agreement

A&M 05-16-2016 35

slide-36
SLIDE 36

36

D-TEC

http://www.dtec-xstack.org

XPRESS

http://xstack.sandia.gov/xpress

XTUNE http://ctop.cs.utah.edu/ x-tune/

GVR http://gvr.cs.uchicago.edu

Traleika Glacier (https://sites.google.com/

site/traleikaglacierxstack/

DynAX http://www.etinternational .com/xstack

SLEEC https://engineering.purdu e.edu/ ~milind/sleec/

DEGAS

http://crd.lbl.gov/groups-depts/ future-technologies-group/ projects/DEGAS/

CORVETTE

http://crd.lbl.gov/groups- depts/ future-technologies-group/ projects/corvette/

Codelet based

US DOE X-Stack Program --- 9 Awardees

A&M 05-16-2016

slide-37
SLIDE 37

Event driven tasks (inspired by Delaware codelet model): Dataflow inspired codelets (self contained/”atomic”). Non blocking, no preemption. Programming model: Separation of concerns: Domain specification & HW mapping. Express data locality with hierarchical tiling. Global, shared, non-coherent address space. Optimization and auto generation of EDTs (HW specific). Execution model: Dynamic, event-driven scheduling, non-blocking. Dynamic decision to move computation to data. Observation based adaption (self-awareness). Implemented in the runtime environment. Separation of concerns: User application, control, and resource management.

Programming & Execution Model

A&M 05-16-2016 37

slide-38
SLIDE 38

Outline

  • Introduction
  • Second Spring of HPC Parallel Computing
  • New Challenges:HPC vs. Big Data – Divergence or

Convergence ?

  • The Codelet Model and SWARM
  • Challenges/Opportunities: HPC + Big Data
  • Summary Remarks

A&M 05-16-2016 38

slide-39
SLIDE 39

Data analytics and computing ecosystem compared

Courtesy by “Exscale Computing and Big Data”, DANIEL A. REED AND JACK DONGARRA , CACM July 2015

Mahout: machine learning tool Hive: data warehouse software Pig: provide high level language for big data Sqoop: exchange data with traditional database Flume: log management Zookeeper: maintaining consistency Storm: real-time computation system. Hbase: a distributed, scalable big data store. AVRO: data serialization system.

Data Analytic

FORTRAN,C,C++: languages PAPI: performance and debugging tool MPI/OpenMP: multi-core parallel model SLURM: batch scheduler Lustre: parallel file system

Computational Science

NOTE: The Divergence of Big Data and HPC Eco-Systems!

Applications and Community Codes Mahout, R, and Applications

Application Level

Hive Pig Sqoop Flume Storm Map-Reduce AVRO Hbase Big Table (key- value store) HDFS (Hadoop File System) Zookeeper (coordination) Cloud Services (e.g. AWS

Application Level Middleware and Management

Virtual Machines and Cloud Services (optional) Linux OS variant FORTRAN, C, C++, and IDEs Domain-specific Libraries MPI/OpenMP + Accelerator Tools Numerical Libraries Performance and Debugging Lustre (Parallel File System) Batch Scheduler (such as SLURM) System Monitoring Tools Linux OS variant Ethernet Switches Local Node Storage Commodit y X86 Racks Infiniband + Ethernet Switches SAN + Local Node Storage X86 Racks + GPUs or Accelerators

System Software Cluster Software Data Analytics Ecosystem Computational Science Ecosystem

A&M 05-16-2016 39

slide-40
SLIDE 40

Issues and Challenges

  • The Three Gaps ?
  • How to handle the divergence (gap) of

eco-systems of extreme-scale computation and big data ?

  • How to bridge the gaps between

– data vs. knowledge – Knowledge vs. “$$”s

  • How to best encourage and guide

innovation and entrepreneurship to bridge the gaps ?

40 A&M 05-16-2016

slide-41
SLIDE 41

Innovation and Entrepreneurship – Jointly Funded by Public and Private Sectors Partnerships

  • Most recently:

Obama and DoD are going to Silicon Volley and announced a "cooperation" between the private sector VCs to promote/fund entrepreneurship for US mission-driven innovations

  • Small business participation is critical for

ground-breaking R&D

  • Very pleased to witness new momentum from Japan

side

  • A broad ground open up for Japan-US collaboration
  • An Example

A&M 05-16-2016 41

slide-42
SLIDE 42

From My Own Experience : Three Case Studies

A&M 05-16-2016 42

slide-43
SLIDE 43

A&M 05-16-2016 43

Case I:Cyclops 64 Project

Cyclops-64 1.1 Petaflops 13 TB memory Processor Rack 11.52 Teraflops 144 GB memory Mid-plane 3.84 Teraflops 48 GB memory Node Card 80 Gigaflops 1 GB memory Cyclops-64 ASIC 80 Procs + 16 iCaches + 96 Port xbar switch C64 Processor 2 Thread Units + 60 KB SRAM + 1FP Unit

  • Cyclops64 System (Blue Gene/C)

– first generation of large-scale many-core chip technology is employed (160 core/chip, upto 10,000 chips/system)

  • Cyclops64系统中,lev

ever erage e dat atal alow model el to brea eak the e OS ba barrier er,

  • Invention of the TiNy-Threads(

TNT)PXM。

slide-44
SLIDE 44

Ethernet Switches Local Node Storage Commodity X86 Racks

Hardware

Infiniband Ethernet Switches SAN Local Node Storage X86 Racks, GPUs or Accelerators

Big Data Ecosystem HPC Ecosystem

Hadoop

System Software

Map-Reduce based API Spark RDD API Spark Storm

CrAMER/HAMR

MPI OpenMP SWARM OpenCL EARTH Cilk Flowlet-based API Storm API Programming Model MPI API OpenMP API Codelet API TiNy-Threads API Data Analytic Machine Learning Real Time Processing Financial Application

Application

Computational Chemistry Bioinformatics Computational physics Domain-specific Application A&M 05-16-2016 44

slide-45
SLIDE 45

Ethernet Switches Local Node Storage Commodity X86 Racks

Hardware

Infiniband Ethernet Switches SAN Local Node Storage X86 Racks, GPUs or Accelerators

Big Data Ecosystem HPC Ecosystem

Hadoop

System Software

Map-Reduce based API Spark RDD API Spark Storm HAMR MPI OpenMP SWARM OpenCL EARTH Cilk Flowlet-based API Storm API Programming Model MPI API OpenMP API Codelet API TiNy-Threads API Data Analytic Machine Learning Real Time Processing Financial Application

Application

Computational Chemistry Bioinformatics Computational physics Domain-specific Application

DataFlow Model Inspired Runtime System

45 A&M 05-16-2016

slide-46
SLIDE 46

Outline

  • Introduction
  • Obama’s Executive Order
  • The Codelet Model and SWARM
  • Challenges/Opportunities: HPC + Big Data
  • Summary Remarks

A&M 05-16-2016 46

slide-47
SLIDE 47

My Remarks on the “Second Spring”

  • Entrepreneurship and Innovation are

critical for the success in the “second spring”!

  • But, we must never forget the past

lessons that we have learned in the 1st Spring!

A&M 05-16-2016 47

slide-48
SLIDE 48

Wang Xing – Forbes Profile

  • Wang Xing on Forbes Lists #83 China

Rich List (2015)

Wang Xing cracks the top 100 richest in China for the first time. Wang created the largest "online-to-offline commerce company in China. Wang, a student of Prof. Guang R. Gao at University of Delaware, went back to China in 2005 and started his innovation and entrepreneur career.

http://www.forbes.com/profile/wang-xing/

Latest News : Chinese startup raises largest private funding round ever!

Published: Jan 19, 2016 9:53 a.m. ET

48 A&M 05-16-2016

slide-49
SLIDE 49

The Three-Way JV Model

  • Three-Way Investment

Public Support + Private Industry Giant + Small Entrepreneur

  • My Own Experience

A&M 05-16-2016 49

slide-50
SLIDE 50

Acknowledgements

  • Sponsors: DOE, DOD, NSF etc.
  • Darema Frederica, AFSOR (DDDAS)
  • Colleagues and collaborators.
  • Intel, UIUC, Indiana U/LSU, Rice, and many
  • thers in the DOE X-Stack Program.
  • Waseda University (Prof. Kasahara and others)
  • Japan SGU Program and Dean Sugano
  • University of Delaware, ETI and CAPSL

50 A&M 05-16-2016