rMPI: Message Passing on Multicore Processors with On-Chip - - PowerPoint PPT Presentation

▶

Oct 08, 2022 242 likes •491 views

rMPI: Message Passing on Multicore Processors with On-Chip Interconnect 19. Oktober 2009 www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect 2 Outline Background RAW microprocessor rMPI

SLIDE 1

rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

19. Oktober 2009

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 2

2

Outline

— Background — RAW microprocessor — rMPI — Evaluation — Discussion

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 3

3

Why?

— Chips offering on-chip network — Ease programmability — MPI is a well known standard — Migrating existing code base is easy — Fine grain program control if necessary

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 4

4

RAW overview

— Developed at MIT — Tiled architecture (16 in ASIC implementation) — 8 stage in-order single issue pipeline — 32kB hardware-managed data cache — 32kB software-managed instruction cache — 64kB software-managed switch instruction memory

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 5

5

Architecture overview

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 6

6

RAW architecture

— ISA allows direct control over network — Four 32-bit networks

Two static, compile time
Two dynamic, programmable

— General Dynamic Network (GDN)

Used by rMPI
32 bit header
Messages up to 32 words
Guarantees message delivery atomically and in-order

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 7

7

RAW pipeline

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 8

8

rMPI

— MPI on RAW — Borrowed ideas from LAM/MPI and MPICH — 75 KLOC!

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 9

9

rMPI architecture

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 10

10

rMPI packet format

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 11

11

Receiving

— Using RAW fast interrupt handler — Interrupt handler sorts and assembles packets — Drains network of contents — Interrupt driven design:

Allows asynchronous communication and computation
Reduce network contention
Avoids deadlocks (blocking sends)
No OS layer that increases delay

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 12

12

Methodology

— Collected results with simulator — LAM/MPI:

128 nodes
Two 2GHz opteron per node, 4GB RAM (use only 1 CPU)
10GB Ethernet

— Speedups relative to a single CPU on each platform running serial implementation

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 13

13

End-To-End overhead

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 14

14

End-To-End overhead comparison

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 15

15

Problems

— Balance between performance and programmability — GDN requires manual packet splitting and reassembly in software — rMPI gives too much overhead for small packets — Guidelines for future designers:

Handles packet splitting and sending
Prevent deadlocks
Middle ground between GDN and rMPI

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 16

16

Performance scaling

— Jacobi relaxation

Low send/receive overhead
16x16 to 2048x2048 matrices

— Matrix multiply — Trapezoidal integration — Parallel pi estimation — Better performance scalability for computationally-intensitive

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 17

17

Jacobi speedup

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 18

18

Speedup summary

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 19

19

DRAM impact

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 20

20

Overhead

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 21

21

Instruction cache size

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 22

22

Matrix multiply

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 23

23

LAM/MPI latency

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

SLIDE 24

24

Discussion!!

www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect