PARALLEL PROGRAMMING IN GO FOR PERFORMANCE WITH THE PARGO LIBRARY - - PowerPoint PPT Presentation

parallel programming in go for performance with the pargo
SMART_READER_LITE
LIVE PREVIEW

PARALLEL PROGRAMMING IN GO FOR PERFORMANCE WITH THE PARGO LIBRARY - - PowerPoint PPT Presentation

PARALLEL PROGRAMMING IN GO FOR PERFORMANCE WITH THE PARGO LIBRARY PASCAL COSTANZA WHAT IS PARGO? Pargo is a library for parallel programming in Go at imecs ExaScience Lab: based on our experiences with parallel programming in C++,


slide-1
SLIDE 1

PARALLEL PROGRAMMING IN GO FOR PERFORMANCE WITH THE PARGO LIBRARY

PASCAL COSTANZA

slide-2
SLIDE 2

WHAT IS PARGO?

§ Pargo is a library for parallel programming in Go at imec’s ExaScience Lab:

§ based on our experiences with parallel programming in C++, Common Lisp, and Java § released under a BSD-style open source license at https://github.com/exascience/pargo

§ Pargo supports numerous common parallel programming patterns:

§ Divide-and-conquer task-based parallelism § Parallel ranges, parallel reduction, parallel Boolean functions § Speculative parallelism § Parallel Quicksort and Mergesort § Parallel hash table § Parallel pipelines inspired by Java Parallel Streams introduced in JDK 8

§ including support for contexts, cancellation, and Go-style error handling

2

slide-3
SLIDE 3

CONCURRENCY VS. PARALLELISM

§ Concurrency is part of the problem domain. § Needs solution even without multicore/node. § Go is really good at this!

3

§ Parallelism is part of the solution domain. § Only needed for performance. § Pargo is really good at this! ;)

https://xkcd.com/726/ Gary W. Sabot, The Paralation Model, The MIT Press, 1988

slide-4
SLIDE 4

PARALLEL PROGRAMMING: AN EXAMPLE

4

slide-5
SLIDE 5

PARALLEL PROGRAMMING: AN EXAMPLE

5

slide-6
SLIDE 6

PARALLEL PROGRAMMING: AN EXAMPLE

6

slide-7
SLIDE 7

PARALLEL PROGRAMMING: AN EXAMPLE

7

slide-8
SLIDE 8

PARALLEL PROGRAMMING: AN EXAMPLE

8

slide-9
SLIDE 9

PARALLEL PROGRAMMING: AN EXAMPLE

9

slide-10
SLIDE 10

PARALLEL PROGRAMMING: AN EXAMPLE

10

slide-11
SLIDE 11

DIVIDE-AND-CONQUER TASK-BASED PARALLELISM

11

sum[0:32] sum[0:16] sum[16:32] sum[0:8] sum[8:16] sum[16:24] sum[24:32] sum[0:4] sum[4:8] sum[8:12] sum[12:16] sum[16:20] sum[20:24] sum[24:28] sum[28:32]

sum[0:2] sum[2:4] sum[4:6] sum[6:8] sum[8:10] sum[10:12] sum[12:14] sum[14:16] sum[16:18] sum[18:20] sum[20:22] sum[22:24] sum[24:26] sum[26;28] sum[28:30] sum[30:32]

slide-12
SLIDE 12

DIVIDE-AND-CONQUER TASK-BASED PARALLELISM

12

slide-13
SLIDE 13

DIVIDE-AND-CONQUER TASK-BASED PARALLELISM

13

WITH 16 CORES

slide-14
SLIDE 14

DIVIDE-AND-CONQUER TASK-BASED PARALLELISM

14

WITH 4 CORES

slide-15
SLIDE 15

DIVIDE-AND-CONQUER TASK-BASED PARALLELISM

15

WITH LOAD IMBALANCE

slide-16
SLIDE 16

DIVIDE-AND-CONQUER TASK-BASED PARALLELISM

§ Task-based parallelism allows flexible distribution of work over CPU cores. § Distributing work evenly over cores is often not optimal, because of load imbalance.

16

slide-17
SLIDE 17

DIVIDE-AND-CONQUER TASK-BASED PARALLELISM

§ Task-based parallelism allows flexible distribution of work over CPU cores. § …but how are tasks actually scheduled over the cores?

17

slide-18
SLIDE 18

WORK STEALING

§ Work stealing is known to be optimal both in theory and practice

§ Blumofe, Leiserson: Scheduling Multithreaded Computations by Work Stealing, Journal of the ACM, 1999 § Frigo, Leiserson, Randall: The Implementation of the Cilk-5 Multithreaded Language, PLDI’98

§ Successfully implemented in many languages and libraries:

§ Cilk for C;Threading Building Blocks for C++; Java fork/join; … § …and Go

18

Wonder Gopher by Ashley McNamara, https://github.com/ashleymcnamara/gophers

slide-19
SLIDE 19

WORK STEALING FINDS OPTIMAL DISTRIBUTION ON THE FLY

19

slide-20
SLIDE 20

PARALLEL PROGRAMMING: AN EXAMPLE

20

slide-21
SLIDE 21

THE EXAMPLE PROGRAM IN PARGO

21

slide-22
SLIDE 22

THE EXAMPLE PROGRAM IN PARGO

22

slide-23
SLIDE 23

THE EXAMPLE PROGRAM IN PARGO

23

slide-24
SLIDE 24

THE EXAMPLE PROGRAM IN PARGO

24

slide-25
SLIDE 25

…AND LOTS OF OTHER PARALLEL PROGRAMMING PATTERNS

§ Parallel Do § Parallel range § Parallel reduction over int, float64, string, interface{} § Parallel range reduction over int, float64, string, interface{} § Parallel And, Or, RangeAnd, RangeOr § Speculative variants of many of the above functions § Sequential variants for debugging § Parallel Quicksort and merge sort § A parallel hash table (similar to Go’s sync.Map) § …and parallel pipelines.

25

slide-26
SLIDE 26

CONCURRENT PIPELINES IN GO

26

slide-27
SLIDE 27

CONCURRENT PIPELINES IN GO

27

slide-28
SLIDE 28

CONCURRENT PIPELINES IN GO

28

slide-29
SLIDE 29

CONCURRENT PIPELINES IN GO

29

slide-30
SLIDE 30

PARALLEL PIPELINES USING PARGO

30

slide-31
SLIDE 31

PARALLEL PIPELINES USING PARGO

31

slide-32
SLIDE 32

PARALLEL PIPELINES USING PARGO

32

slide-33
SLIDE 33

PARALLEL PIPELINES USING PARGO

33

slide-34
SLIDE 34

PARALLEL PIPELINES USING PARGO

34

slide-35
SLIDE 35

PARALLEL PIPELINES IN PARGO

§ Predefined pipeline sources for arrays, slices, strings, channels, and bufio.Scanner. § Support for user-defined sources through the pipeline.Source interface. § Support for several kinds of nodes (stages):

§ Sequential, ordered, parallel § Strictly ordered, limited parallel § Skip and Limit nodes

§ Support for several kinds of filters:

§ Generic receive and finalize § Boolean filters: Every, Some, NotEvery, NotAny § Counting filter § Slice filter

§ Support for contexts, including cancellation § Support for error handling, including cancellation on error § Support for fine-tuning of batch sizes

35

slide-36
SLIDE 36

Go Gopher image by Renee French, CC BY 3.0, https://creativecommons.org/licenses/by/3.0/

ELPREP: A HIGH-PERFORMANCE TOOL FOR SEQUENCING

§ High-performance tool for preparing SAM files for variant calling. § Multi-threaded application that runs entirely in RAM and merges multiple steps to avoid repeated file I/O. § Can improve performance by a factor of up to x10 compared to standard tools. § elPrep implemented in Go since version 3.0

§ https://github.com/exascience/elprep

Picard/Samtools elPrep elPrep (merged) elPrep (max RAM) elPrep (max RAM + merged) 20m 40m 1h 1h 20m 1h 40m 2h sort by coordinates filter unmapped reads mark duplicates add read groups filter sequence dictionary merged

slide-37
SLIDE 37

PARGO

§ Pargo available at https://github.com/exascience/pargo § Documentation: https://godoc.org/github.com/exascience/pargo § More documentation: https://github.com/exascience/pargo/wiki § elPrep: https://github.com/exascience/elprep

37

slide-38
SLIDE 38

38