How I Learned to Rohit Zambre,* Stop Aparna - PowerPoint PPT Presentation

How I Learned to Rohit Zambre,* Stop Aparna Chandramowlishwaran,* Worrying Pavan Balaji ⌃ About *University of California, Irvine User-Visible ⌃ Argonne National Laboratory Endpoints and Love MPI

2 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI MPI everywhere Node Core Process

3 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI MPI everywhere ▸ Model artifact : high memory requirements that worsen with increase domain- dimensionality and number of ranks. ▸ Hardware usage : resource wastage with static split of limited resources on processor Node Core Process Increasing number of cores Decreasing memory per core

4 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI MPI+threads ▸ Model artifact : reduces duplicated data by a factor of number of threads. ▸ Hardware usage : able to use the many cores while sharing all of processor’s resources. Node Core Process Thread Increasing number of cores Decreasing memory per core

5 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI 7 Category Computation 6 Corresponding Allgatherv AlltoAllv MPI everywhere 5 Time (seconds) MPI+threads 4 OOM! 3 runs 2 1 0 1x440x110 1x220x220 1x110x440 6x180x45 6x90x90 6x45x180 Processor Grid (threads x processor rows x processor columns) Buluc et al. Distributed BFS (https://arxiv.org/abs/1705.04590)

6 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI 7 Category Computation 6 Corresponding Allgatherv AlltoAllv MPI everywhere 5 Time (seconds) MPI+threads 4 OOM! 3 runs 2 1 0 1x440x110 1x220x220 1x110x440 6x180x45 6x90x90 6x45x180 Processor Grid (threads x processor rows x processor columns) Buluc et al. Distributed BFS (https://arxiv.org/abs/1705.04590) MPI_Isend (8 B) 60 Communication Million Messages/s performance of 40 Model MPI everywhere MPI+threads is MPI+threads (MPI_THREAD_FUNNELED) MPI+threads (MPI_THREAD_MULTIPLE) 20 dismal 0 1 2 4 8 16 Number of cores

7 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI Node Outdated view : Network is a single device Modern reality : Network features parallelism Network Interface Card Network Interface Card Network hardware context

8 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI MPI everywhere MPI+threads P0 P1 P2 P3 P0 Application MPI library Network Interface Card Network Interface Card Software communication channel Network hardware context

9 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI MPI everywhere MPI+threads P0 P1 P2 P3 P0 Application MPI library Network Interface Card Network Interface Card Global critical section + 1 communication channel per process Software communication channel Network hardware context

10 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI No logical MPI everywhere MPI+threads parallelism expressed P0 P1 P2 P3 P0 Application MPI library Network Interface Card Network Interface Card Global critical section + 1 communication channel per process Software communication channel Network hardware context

11 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI MPI_Comm_create_endpoints(…,num_ep,…,comm_eps[]); MPI_Isend/Irecv(…,comm_eps[tid],ep_rank,…); MPI process EP0 EP2 EP3 EP4 MPI library Network Interface Card MPI Communicator MPI Endpoint Software communication channel Network hardware context

12 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI MPI_Comm_create_endpoints(…,num_ep,…,comm_eps[]); MPI_Isend/Irecv(…,comm_eps[tid],ep_rank,…); MPI process Pros ▸ Explicit control over network contexts EP0 EP2 EP3 EP4 Cons MPI library ▸ Intrusive extension of the MPI standard ▸ Onus of managing network contexts on user Network Interface Card MPI Communicator MPI Endpoint Software communication channel Network hardware context

13 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI Logical parallelism MPI everywhere MPI+threads expressed P0 P0 P1 P2 P3 Application C0 C1 C2 C3 MPI library Network Interface Card Network Interface Card Fine-grained critical sections + Multiple communication channel per process MPI Communicator Software communication channel Network hardware context

14 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI Do we need user-visible endpoints? Logical parallelism MPI everywhere MPI+threads expressed P0 P0 P1 P2 P3 Application C0 C1 C2 C3 MPI library Network Interface Card Network Interface Card Fine-grained critical sections + Multiple communication channel per process MPI Communicator Software communication channel Network hardware context

15 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI CONTRIBUTIONS AS DEVIL’S ADVOCATE ▸ In-depth comparison between MPI-3.1 and user-visible endpoints ▸ A fast MPI+threads library that adheres to MPI-3.1’s constraints ▸ Optimized parallel communication streams applicable to all MPI libraries ▸ Recommendations for the MPI user to express logical parallelism with MPI-3.1 MPI library Interconnects Evaluation ▸ Based on ▸ Intel Omni-Path (OPA) with OFI:PSM2 platforms MPICH:CH4 ▸ Mellanox InfiniBand (IB) with UCX:Verbs

16 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI OUTLINE ▸ Introduction ▸ For MPI users: Parallelism in the MPI standard ▸ For MPI developers: Fast MPI+threads ▸ Fine-grained critical sections for thread safety ▸ Virtual Communication Interfaces (VCIs) for parallel communication streams ▸ Microbenchmark and Application analysis

17 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI POINT-TO-POINT COMMUNICATION ▸ <comm,rank,tag> decides matching ▸ Non-overtaking order ▸ Receive wildcards Two or more operations on a Can be issued on parallel process with communication streams? Comm Rank Tag Send Recv

18 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI POINT-TO-POINT COMMUNICATION ▸ <comm,rank,tag> decides matching Rank 0 (sender) ▸ Non-overtaking order ▸ Receive wildcards Two or more operations on a Can be issued on parallel <CA,R1,T1> <CB,R1,T1> process with communication streams? Comm Rank Tag Send Recv <CA,R0,T1> <CB,R0,T1> Different or Different or Different Yes Yes Same Same Rank 1 (receiver)

19 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI POINT-TO-POINT COMMUNICATION ▸ <comm,rank,tag> decides matching Rank 0 (sender) ▸ Non-overtaking order ▸ Receive wildcards Two or more operations on a Can be issued on parallel <CA,R1,T1> <CA,R2,T1> process with communication streams? Comm Rank Tag Send Recv <CA,R0,T1> <CA,ANY,T1> Different or Different or Different Yes Yes Same Same Different or Same Different Yes No Rank 1 (receiver) Same Wildcards

20 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI POINT-TO-POINT COMMUNICATION Rank 0 (sender) ▸ <comm,rank,tag> decides matching ▸ Non-overtaking order ▸ Receive wildcards <CA,R1,T1> Two or more operations on a Can be issued on parallel process with communication streams? <CA,R1,T2> Comm Rank Tag Send Recv Different or Different or Different Yes Yes <CA,R0,T3> <CA,R0,ANY> Same Same Different or Same Different Yes No Same Wildcards Different or Rank 1 (receiver) Same Same No No Same Non-overtaking order

21 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI RMA COMMUNICATION Two or more operations Can be issued on parallel communication streams? on a process with Window Rank Put Get Accumulate Different or Different Yes Yes Yes Same Same Different Yes Yes Yes Same Same Yes Yes No

22 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI RMA COMMUNICATION Two or more operations Can be issued on parallel communication streams? on a process with Explicitly expressing Window Rank Put Get Accumulate parallelism Different or Different Yes Yes Yes Same Same Different Yes Yes Yes Implicit parallelism Same Same Yes Yes No No order between multiple Gets and Puts

23 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI RMA COMMUNICATION Two or more operations Can be issued on parallel communication streams? on a process with Explicitly expressing Window Rank Put Get Accumulate parallelism Different or Different Yes Yes Yes Same Same Different Yes Yes Yes Implicit parallelism Same Same Yes Yes No Ordering of accumulate No order between operations to the same multiple Gets and Puts memory location

24 HOW I LEARNED TO STOP WORRYING ABOUT USER-VISIBLE ENDPOINTS AND LOVE MPI OUTLINE ▸ Introduction ▸ For MPI users: Parallelism in the MPI standard ▸ For MPI developers: Fast MPI+threads ▸ Fine-grained critical sections for thread safety ▸ Virtual Communication Interfaces (VCIs) for parallel communication streams ▸ Microbenchmark and Application analysis

How I Learned to Rohit Zambre,* Stop Aparna - PowerPoint PPT Presentation

How I Learned to Rohit Zambre,* Stop Aparna Chandramowlishwaran,* Worrying Pavan Balaji About *University of California, Irvine User-Visible Argonne National Laboratory Endpoints and Love MPI 2 HOW I LEARNED TO STOP WORRYING

Lessons Learned Lessons Learned From From Lessons Learned Lessons Learned From From

1/37 Lesson: How I Learned to Stop Worrying and Love the Bot 2/37 Lesson: How I Learned to Stop

Object-Oriented Design What Do We Mean by OO Design? Remember how we learned about functions?

Are You Getting Good Are You Getting Good Sleep? Sleep? Weve learned more We ve learned

has learned Col Rakesh Jetly OMM,CD,MD,FRCPC. Lots of Ground to Cover!! Why am I here?

Lessons Learned From Sequenced, Integrated Strategies of Economic After Hours Seminar

Some lessons learned from Team Science Some lessons learned from Team Science Lewis Cantley Weill

Opportunities Opportunities Lessons Learned Using Lessons Learned Using Vegetative

OSHA Lessons Learned Adam Fries OSHA Compliance Officer February 13, 2018 OSHA Lessons Learned

Electrical Team Marina Beshai, Leslie Kim, and Sara Sacks Overview of the Semester: Learned

Lessons Learned from A Three-Week Lessons Learned from A Three-Week Long User Study w ith

OVERVI EW OF MTN 015 AND OVERVI EW OF MTN 015 AND LESSONS LEARNED LESSONS LEARNED Peter Mutale

{ We are at the end of our first cycle of Learning Outcomes Assessment What has COA learned:

What We Have Learned from the Pandemic NURSING SIMULATION: What We Have Learned from the

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples Nicholas

Institutionalizing Lessons Learned October 25, 2006 Loren Plisco Region II Background

Update Parallelism April 30, 2018 1 HW 3 Posted 2 Parallelism Models Option 4: Shared

C S P , C a s p e r a n d S e c u r i t y V e r i f i c a t i o n

Verifying Business Process Compatibility Peter Wong, University of Oxford, UK (Joint work with

Chapter 4: Processes Process Concept Process Scheduling Operations on Processes

DISTRIBUTING INFORMATION AGENDA What does information distribution mean? Delegation Closures

LRTS: A Portable High Performance Low-level Communication Interface Yanhua Sun 1 ale 1

Distributed Systems Secure Communication Paul Krzyzanowski pxk@cs.rutgers.edu Except as

System-on-Chip Design HW/SW Interfaces and Communica;ons Hao Zheng Comp Sci & Eng U of

How I Learned to Rohit Zambre,* Stop Aparna - PowerPoint PPT Presentation

How I Learned to Rohit Zambre,* Stop Aparna Chandramowlishwaran,* Worrying Pavan Balaji About *University of California, Irvine User-Visible Argonne National Laboratory Endpoints and Love MPI 2 HOW I LEARNED TO STOP WORRYING

Lessons Learned Lessons Learned From From Lessons Learned Lessons Learned From From

1/37 Lesson: How I Learned to Stop Worrying and Love the Bot 2/37 Lesson: How I Learned to Stop

Object-Oriented Design What Do We Mean by OO Design? Remember how we learned about functions?

Are You Getting Good Are You Getting Good Sleep? Sleep? Weve learned more We ve learned

has learned Col Rakesh Jetly OMM,CD,MD,FRCPC. Lots of Ground to Cover!! Why am I here?

Lessons Learned From Sequenced, Integrated Strategies of Economic After Hours Seminar

Some lessons learned from Team Science Some lessons learned from Team Science Lewis Cantley Weill

Opportunities Opportunities Lessons Learned Using Lessons Learned Using Vegetative

OSHA Lessons Learned Adam Fries OSHA Compliance Officer February 13, 2018 OSHA Lessons Learned

Electrical Team Marina Beshai, Leslie Kim, and Sara Sacks Overview of the Semester: Learned

Lessons Learned from A Three-Week Lessons Learned from A Three-Week Long User Study w ith

OVERVI EW OF MTN 015 AND OVERVI EW OF MTN 015 AND LESSONS LEARNED LESSONS LEARNED Peter Mutale

{ We are at the end of our first cycle of Learning Outcomes Assessment What has COA learned:

What We Have Learned from the Pandemic NURSING SIMULATION: What We Have Learned from the

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples Nicholas

Institutionalizing Lessons Learned October 25, 2006 Loren Plisco Region II Background

Update Parallelism April 30, 2018 1 HW 3 Posted 2 Parallelism Models Option 4: Shared

C S P , C a s p e r a n d S e c u r i t y V e r i f i c a t i o n

Verifying Business Process Compatibility Peter Wong, University of Oxford, UK (Joint work with

Chapter 4: Processes Process Concept Process Scheduling Operations on Processes

DISTRIBUTING INFORMATION AGENDA What does information distribution mean? Delegation Closures

LRTS: A Portable High Performance Low-level Communication Interface Yanhua Sun 1 ale 1

Distributed Systems Secure Communication Paul Krzyzanowski pxk@cs.rutgers.edu Except as

System-on-Chip Design HW/SW Interfaces and Communica;ons Hao Zheng Comp Sci &amp; Eng U of

System-on-Chip Design HW/SW Interfaces and Communica;ons Hao Zheng Comp Sci & Eng U of