Extending MPI to Accelerators* Jeff A. Stuart, John D. Owens - - PowerPoint PPT Presentation

extending mpi to accelerators
SMART_READER_LITE
LIVE PREVIEW

Extending MPI to Accelerators* Jeff A. Stuart, John D. Owens - - PowerPoint PPT Presentation

Extending MPI to Accelerators* Jeff A. Stuart, John D. Owens University of California, Davis cpunerd@gmail.com, jowens@ece.ucdavis.edu Pavan Balaji Argonne National Laboratories balaji@mcs.anl.gov * For this presentation, we mean GPUs


slide-1
SLIDE 1

Extending MPI to Accelerators*

Jeff A. Stuart, John D. Owens University of California, Davis cpunerd@gmail.com, jowens@ece.ucdavis.edu Pavan Balaji Argonne National Laboratories balaji@mcs.anl.gov

* For this presentation, we mean GPUs

slide-2
SLIDE 2

Outline

  • Motivation
  • Previous Work
  • Proposal
  • Challenges
slide-3
SLIDE 3

Motivation

  • HPC no longer (just) CPU
  • GPUs Have Problems
  • Slave Device
  • No system calls
slide-4
SLIDE 4

Previous Work

  • Three Main Works
  • cudaMPI
  • GAMPI
  • DCGN
slide-5
SLIDE 5

Previous Work

  • cudaMPI
  • Handles buffer movement
  • No ranks for GPUs
slide-6
SLIDE 6

Previous Work

  • GAMPI
  • GPUs have ranks*
  • More communicators
  • Handles buffer movement
slide-7
SLIDE 7

Previous Work

  • DCGN
  • GPUs have ranks
  • GPUs source/sink communication*
  • Doesn't implement standard MPI
slide-8
SLIDE 8

Proposal

  • Several Ideas
  • No Ranks for GPUs
  • Multiple Ranks per GPU Context
  • One Rank per GPU Context
  • New MPI Function(s) to Spawn Kernels
slide-9
SLIDE 9

Proposal

  • No Ranks for GPUs
  • The way things work right now
  • No changes necessary to MPI
slide-10
SLIDE 10

Proposal

  • Multiple Ranks Per Accelerator Context
  • Ranks exist for lifetime of application

– # of ranks chosen at runtime by user

  • Modifications to MPI

– Bind GPU threads to rank/MPI functions take source rank – Host must listen for requests

  • Extra threads on CPU (one for each GPU)
slide-11
SLIDE 11

Proposal

  • One Rank per Accelerator Context
  • Ranks exist for lifetime of Application
  • Mapping of Processes:Contexts?
  • Can CPU Processes use MPI communication?
slide-12
SLIDE 12

Proposal

  • New MPI Function(s) to Spawn Kernels
  • New communicators and ranks after every spawn

– Cleaned up after all kernels finish

  • Intercommunicator(s) available upon request
slide-13
SLIDE 13

Challenges

  • Threads vs Processes
  • Extra Communicators?
  • Collectives
  • Source/Sink Communication
slide-14
SLIDE 14

Looking Forward

  • GPU-Direct is good
  • GPU-Direct 2 is great
  • We want GPU-Direct 3 to
  • Let GPU source/sink
  • Use GPU-Direct 2 to interface with NIC
  • Administer MPI ranks without CPU interference
slide-15
SLIDE 15

One Last Note

  • Graduating with Ph.D. In June 2012
  • Resume at http://jeff.bleugris.com/resume.pdf