Parallel programming with Session Java Nicholas Ng ( - - PowerPoint PPT Presentation

parallel programming with session java
SMART_READER_LITE
LIVE PREVIEW

Parallel programming with Session Java Nicholas Ng ( - - PowerPoint PPT Presentation

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Parallel programming with Session Java Nicholas Ng ( nickng@doc.ic.ac.uk ) Imperial College London 1/17 Introduction


slide-1
SLIDE 1

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Parallel programming with Session Java

Nicholas Ng (nickng@doc.ic.ac.uk)

Imperial College London

1/17

slide-2
SLIDE 2

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Motivation

Parallel designs are difficult, error prone (eg. MPI) Session types ensure communication safety in concurrent systems So use session types to design safe parallel algorithms for high performance clusters

2/17

slide-3
SLIDE 3

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Contributions

An implementation of parallel n-body simulation

1 Programmed in Session Java (SJ), a full implementation of

session types

2 Uses FPGA on the AXEL heterogeneous cluster

A formal description of multicast outwhile, inwhile SJ primitives in session types Showed type soundness, progress property in SJ parallel programs connected in a ring topology Proved SJ n-body implementation deadlock free

3/17

slide-4
SLIDE 4

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Session types

Typing system for [HVK98] π-calculus π-calculus models structured interactions between processes Main idea: communication primitives should have a dual

Example

(Conventional type system) int i = 9 i is type int 9 is type int Process A: cab!9; P (send 9 to B via channel cab) Process B: cab?(x).Q (receive x from A via channel cab) A is type Send int (or cab : ![int]) B is type Receive int (or cab : ?[int])

4/17

slide-5
SLIDE 5

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Session programming with SJ

Session Java (SJ) [HYH08] A full implementation of binary session types in Java Provides a socket programming interface with eg. accept(),

request(), send(), receive()

Workflow of a SJ program:

1 Declare session type/protocol of program in SJ 2 SJ compiler checks local session type conformance 3 Runtime duality check with communicating program

5/17

slide-6
SLIDE 6

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

SJ features for parallel programming

Iteration chaining Multi-channel inwhile and outwhile in place of reduce-scatter operations Master:

<s1,s2>.outwhile(i<42){...}

Forwarder1:

s3.outwhile(s1.inwhile){...}

Forwarder2:

s4.outwhile(s2.inwhile){...}

End:

<s3,s4>.inwhile(){...}

Master Forwarder 1 End Forwarder 2

6/17

slide-7
SLIDE 7

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Simple example: N-body simulation

Figure: Result force is vector sum of all forces

n particles following Newton’s laws

  • f motion

Calculate the result force acting on each particle Displace the particle based on net force acting on it

7/17

slide-8
SLIDE 8

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Simple example: N-body simulation

Implemented in a ring topology 3 kinds of processes - Master, Worker (multiple), LastWorker

1 Each allocated a partition of

particles

2 Calculate resultant forces on

received set of particles

3 Forward to next node 4 Repeat until end of one time step

Master Worker Worker Last Worker 8/17

slide-9
SLIDE 9

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Another example: Jacobi method

Iteration-based method for solving the Discrete Poisson Equation Used in physics and natural sciences Given initial prediction, iterate until converged or upper limit

  • f iterations
  • edge

edge

  • edge

value value edge edge value value edge

  • edge

edge

  • Figure: A sub-matrix of calculation

9/17

slide-10
SLIDE 10

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Another example: Jacobi method

Implemented in a mesh topology (2D decomposition) 9 kinds of processes - one for each edge case and a Worker in the center

1 Each allocated a sub-matrix of

values

2 Calculate average of neighbouring

values for all element

3 exchange edges to adjacent

sub-grid

4 Repeat until converged

Worker NorthEast Worker North Master Worker East Worker Worker West Worker SouthEast Worker South Worker SouthWest 1 3 6 2 5 8 4 7 9

10/17

slide-11
SLIDE 11

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

AXEL: a heterogeneous cluster

Axel [TL10] is a heterogeneous cluster that contains different Processing Elements (PE) on each node: CPU Off-the-shelf multicore x86 architecture CPU GPU Graphics Processing Unit, nVidia Tesla, dedicated General Purpose GPU FPGA Field Programmable Gate Arrays, reconfigurable hardware AXEL is a 16-node NNUS cluster Each node can be used as individual PC Connected by high speed Ethernet

11/17

slide-12
SLIDE 12

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Performance benchmark results

Against MPJ Express [SCB09], implementation of MPI in Java Performance competitive (Left: N-body simulation, Right: Jacobi method)

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 500 1000 1500 2000 2500 3000 Runtime (milliseconds) Number of particles per node n-Body simulation Multi-channel SJ Old SJ MPJ Express 500 1000 1500 2000 2500 3000 1000 1500 2000 2500 3000 3500 4000 4500 5000 Runtime (seconds) Partition size Jacobi solution of the Discrete Poisson Equation Multi-channel SJ Old SJ MPJ Express

12/17

slide-13
SLIDE 13

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Performance benchmark results (with FPGA)

Better performance with more particles Best performance: SJ+FPGA 2x faster than SJ implementation

10000 20000 30000 40000 50000 60000 70000 80000 10000 20000 30000 40000 50000 60000 70000 Runtime (milliseconds) Number of particles SJ + FPGA SJ MPJExpress

13/17

slide-14
SLIDE 14

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Well-formed topology

Multichannel inwhile and outwhile not safe on its own Well-formed topology: Topology constructed as DAG with 1 root node and 1 sink node Individual pairs of sessions are dual Iteration controlled by a single condition in the Master node Deadlock freedom for group of processes in well-formed topology

4 2 1 7 5 3 9 8 6

14/17

slide-15
SLIDE 15

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Future (and ongoing) work

C based language implementing session types Higher performance with FPGA or other acceleration hardware Can integrate with AXEL or similar HPC applications toolchain

15/17

slide-16
SLIDE 16

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion 16/17

slide-17
SLIDE 17

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

References

Kohei Honda, Vasco T. Vasconcelos, and Makoto Kubo. Language primitives and type disciplines for structured communication-based programming. In ESOP’98, volume 1381, pages 22–138, 1998. Raymond Hu, Nobuko Yoshida, and Kohei Honda. Session-based distributed programming in java. In ECOOP’08, volume 5142 of LNCS, pages 516–541, 2008. Aamir Shafi, Bryan Carpenter, and Mark Baker. Nested Parallelism for Multi-core HPC Systems using Java. Journal of Parallel and Distributed Computing, 69(6):532 – 545, 2009. Kuen Hung Tsoi and Wayne Luk. Axel: a heterogeneous cluster with FPGAs and GPUs. In FPGA ’10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays, pages 115–124, New York, NY, USA, 2010. ACM. 17/17