GrPPI Generic Reusable Parallel Patterns Interface ARCOS Group - - PowerPoint PPT Presentation

grppi
SMART_READER_LITE
LIVE PREVIEW

GrPPI Generic Reusable Parallel Patterns Interface ARCOS Group - - PowerPoint PPT Presentation

GrPPI GrPPI Generic Reusable Parallel Patterns Interface ARCOS Group University Carlos III of Madrid Spain January 2018 cbed 1/105 GrPPI Warning c This work is under Attribution-NonCommercial- NoDerivatives 4.0 International (CC


slide-1
SLIDE 1

GrPPI

GrPPI

Generic Reusable Parallel Patterns Interface

ARCOS Group University Carlos III of Madrid Spain

January 2018

cbed

1/105

slide-2
SLIDE 2

GrPPI

Warning

c

This work is under Attribution-NonCommercial- NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license. You are free to Share — copy and redistribute the ma- terial in any medium or format.

b

You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

e

You may not use the material for commercial purposes.

d

If you remix, transform, or build upon the material, you may not distribute the modified material.

cbed

2/105

slide-3
SLIDE 3

GrPPI

ARCOS@uc3m

UC3M: A young international research oriented university. ARCOS: An applied research group.

Lines: High Performance Computing, Big data, Cyberphysical Systems, and Programming models for application improvement.

Improving applications:

REPARA: Reengineering and Enabling Performance and poweR of Applications. Financiado por Comisión Europea (FP7). 2013–2016 RePhrase: REfactoring Parallel Heterogeneous Resource Aware Applications. Financiado por Comisión Europea (H2020). 2015–2018

Standardization:

ISO/IEC JTC/SC22/WG21. ISO C++ standards committe.

cbed

3/105

slide-4
SLIDE 4

GrPPI

Acknowledgements

The GrPPI library has been partially supported by:

Project ICT 644235 “REPHRASE: REfactoring Parallel Heterogeneous Resource-aware Applications” funded by the European Commission through H2020 program (2015-2018). Project TIN2016-79673-P “Towards Unification of HPC and Big Data Paradigms” funded by the Spanish Ministry

  • f Economy and Competitiveness (2016-2019).

cbed

4/105

slide-5
SLIDE 5

GrPPI

GrPPI team

Main team

  • J. Daniel Garcia (UC3M, lead).

David del Río (UC3M). Manuel F . Dolz (UC3M). Javier Fernández (UC3M). Javier Garcia Blas (UC3M).

Cooperation

Plácido Fernández (UC3M-CERN). Marco Danelutto (Univ. Pisa) Massimo Torquati (Univ. Pisa) Marco Aldinucci (Univ. Torino) . . .

cbed

5/105

slide-6
SLIDE 6

GrPPI Introduction

1

Introduction

2

Data patterns

3

Task Patterns

4

Streaming patterns

5

Writing your own execution

6

Evaluation

7

Conclusions

cbed

6/105

slide-7
SLIDE 7

GrPPI Introduction Parallel Programming

1

Introduction Parallel Programming Design patterns and parallel patterns GrPPI architecture

cbed

7/105

slide-8
SLIDE 8

GrPPI Introduction Parallel Programming

Thinking in Parallel is hard.

cbed

8/105

slide-9
SLIDE 9

GrPPI Introduction Parallel Programming

Thinking is hard.

Yale Patt

cbed

8/105

slide-10
SLIDE 10

GrPPI Introduction Parallel Programming

Sequential Programming versus Parallel Programming

Sequential programming

Well-known set of control-structures embedded in programming languages. Control structures inherently sequential.

cbed

9/105

slide-11
SLIDE 11

GrPPI Introduction Parallel Programming

Sequential Programming versus Parallel Programming

Sequential programming

Well-known set of control-structures embedded in programming languages. Control structures inherently sequential.

Parallel programming

Constructs adapting sequential control structures to the parallel world (e.g. parallel-for).

cbed

9/105

slide-12
SLIDE 12

GrPPI Introduction Parallel Programming

Sequential Programming versus Parallel Programming

Sequential programming

Well-known set of control-structures embedded in programming languages. Control structures inherently sequential.

Parallel programming

Constructs adapting sequential control structures to the parallel world (e.g. parallel-for).

But wait!

What if we had constructs that could be both sequential and parallel?

cbed

9/105

slide-13
SLIDE 13

GrPPI Introduction Design patterns and parallel patterns

1

Introduction Parallel Programming Design patterns and parallel patterns GrPPI architecture

cbed

10/105

slide-14
SLIDE 14

GrPPI Introduction Design patterns and parallel patterns

Software design

There are two ways of constructing a software design: One way is to make it so simple that there are

  • bviously no deficiencies, and the other way is to

make it so complicated that there are no obvious deficiencies. The first method is far more difficult. C.A.R Hoare

cbed

11/105

slide-15
SLIDE 15

GrPPI Introduction Design patterns and parallel patterns

A brief history of patterns

From building and architecture (Cristopher Alexander):

1977: A Pattern Language: Towns, Buildings, Construction. 1979: The timeless way of buildings.

cbed

12/105

slide-16
SLIDE 16

GrPPI Introduction Design patterns and parallel patterns

A brief history of patterns

From building and architecture (Cristopher Alexander):

1977: A Pattern Language: Towns, Buildings, Construction. 1979: The timeless way of buildings.

To software design (Gamma et al.):

1993: Design Patterns: abstraction and reuse of object

  • riented design. ECOOP

. 1995: Design Patterns. Elements of Reusable Object-Oriented Software.

cbed

12/105

slide-17
SLIDE 17

GrPPI Introduction Design patterns and parallel patterns

A brief history of patterns

From building and architecture (Cristopher Alexander):

1977: A Pattern Language: Towns, Buildings, Construction. 1979: The timeless way of buildings.

To software design (Gamma et al.):

1993: Design Patterns: abstraction and reuse of object

  • riented design. ECOOP

. 1995: Design Patterns. Elements of Reusable Object-Oriented Software.

To parallel programming (McCool, Reinders, Robinson):

2012: Structured Parallel Programming: Patterns for Efficient Computation.

cbed

12/105

slide-18
SLIDE 18

GrPPI Introduction GrPPI architecture

1

Introduction Parallel Programming Design patterns and parallel patterns GrPPI architecture

cbed

13/105

slide-19
SLIDE 19

GrPPI Introduction GrPPI architecture

Some ideals

Applications should be expressed independently of the execution model.

cbed

14/105

slide-20
SLIDE 20

GrPPI Introduction GrPPI architecture

Some ideals

Applications should be expressed independently of the execution model. Multiple back-ends should be offered with simple switching mechanisms.

cbed

14/105

slide-21
SLIDE 21

GrPPI Introduction GrPPI architecture

Some ideals

Applications should be expressed independently of the execution model. Multiple back-ends should be offered with simple switching mechanisms. Interface should integrate seamlessly with modern C++ standard library.

cbed

14/105

slide-22
SLIDE 22

GrPPI Introduction GrPPI architecture

Some ideals

Applications should be expressed independently of the execution model. Multiple back-ends should be offered with simple switching mechanisms. Interface should integrate seamlessly with modern C++ standard library. Make use of modern (C++14) language features.

cbed

14/105

slide-23
SLIDE 23

GrPPI Introduction GrPPI architecture

GrPPI https://github.com/arcosuc3m/grppi

cbed

15/105

slide-24
SLIDE 24

GrPPI Introduction GrPPI architecture

GrPPI https://github.com/arcosuc3m/grppi

A header only library (might change). A set of execution policies. A set of type safe generic algorithms. Requires C++14. GNU GPL v3.

cbed

15/105

slide-25
SLIDE 25

GrPPI Introduction GrPPI architecture

Setting up GrPPI

Structure.

include: Include files. unit_tests: Unit tests using GoogleTest. samples: Sample programs. cmake-modules: Extra CMake scripts.

cbed

16/105

slide-26
SLIDE 26

GrPPI Introduction GrPPI architecture

Setting up GrPPI

Structure.

include: Include files. unit_tests: Unit tests using GoogleTest. samples: Sample programs. cmake-modules: Extra CMake scripts.

Initial setup

mkdir build cd build cmake .. make

cbed

16/105

slide-27
SLIDE 27

GrPPI Introduction GrPPI architecture

CMake variables

GRPPI_UNIT_TESTS_ENABLE: Enable building unit tests. GRPPI_OMP_ENABLE: Enable OpenMP back-end. GRPPI_TBB_ENABLE: Enable Intel TBB back-end. GRPPI_EXAMPLE_APPLICATIONS_ENABLE: Enable building example applications. GRPPI_DOXY_ENABLE: Enable documentation generation.

cbed

17/105

slide-28
SLIDE 28

GrPPI Introduction GrPPI architecture

Execution policies

The execution model is encapsulated by execution values. Current execution types:

sequential_execution. parallel_execution_native. parallel_execution_omp. parallel_execution_tbb. dynamic_execution.

All top-level patterns take one execution object.

cbed

18/105

slide-29
SLIDE 29

GrPPI Introduction GrPPI architecture

Concurrency degree

Sets the number of underlying threads used by the execution implementation.

sequential_execution ⇒ 1 parallel_execution_native ⇒ hardware_concurrency(). parallel_execution_omp ⇒ omp_get_num_threads().

API

ex.set_concurrency_degree(4) int n = ex.concurrency_degree()

cbed

19/105

slide-30
SLIDE 30

GrPPI Introduction GrPPI architecture

Dynamic back-end

Useful if you want to take the decision at run-time. Holds any other execution policy (or empty).

cbed

20/105

slide-31
SLIDE 31

GrPPI Introduction GrPPI architecture

Dynamic back-end

Useful if you want to take the decision at run-time. Holds any other execution policy (or empty). Selecting the execution back-end

grppi :: dynamic_execution execution_mode(const std::string & opt) { using namespace grppi; if ("seq" == opt) return sequential_execution{}; if ("thr" == opt) return parallel_execution_native {}; if ("omp" == opt) return parallel_execution_omp{}; if ("tbb" == opt) return parallel_execution_tbb {}; return {}; }

cbed

20/105

slide-32
SLIDE 32

GrPPI Introduction GrPPI architecture

Function objects

GrPPI is heavily based on passing code sections as function objects (aka functors). Alternatives:

Standard C++ predefined functors (e.g. std::plus<int>). Custom hand-written function objects. Lambda expressions.

Usually lambda expressions lead to more concise code.

cbed

21/105

slide-33
SLIDE 33

GrPPI Data patterns

1

Introduction

2

Data patterns

3

Task Patterns

4

Streaming patterns

5

Writing your own execution

6

Evaluation

7

Conclusions

cbed

22/105

slide-34
SLIDE 34

GrPPI Data patterns Map pattern

2

Data patterns Map pattern Reduce pattern Map/reduce pattern Stencil pattern

cbed

23/105

slide-35
SLIDE 35

GrPPI Data patterns Map pattern

Maps on data sequences

A map pattern applies an operation to every element in a tuple of data sets generating a new data set. Given:

A sequence x1

1, x1 2, . . . , x1 N ∈ T1,

A sequence x2

1, x2 2, . . . , x2 N ∈ T2,

. . . , and A sequence xM

1 , xM 2 , . . . , xM N ∈ TM,

A function f : T1 × T2 × . . . × TM → U

It generates the sequence

f(x1

1, x2 1, . . . , xM 1 ), f(x1 2, x2 2, . . . , xM 2 ), . . . , f(x1 N, x2 N, . . . , xM N ) cbed

24/105

slide-36
SLIDE 36

GrPPI Data patterns Map pattern

Maps on data sequences

cbed

25/105

slide-37
SLIDE 37

GrPPI Data patterns Map pattern

Unidimensional maps

map pattern on a single input data set. Given:

A sequence x1, x2, . . . , xN ∈ T A function f : T → U

It generates the sequence:

f(x1), f(x2), . . . , f(xN)

cbed

26/105

slide-38
SLIDE 38

GrPPI Data patterns Map pattern

Key element

Transformer operation: Any operation that can perform the transformation for a data item.

cbed

27/105

slide-39
SLIDE 39

GrPPI Data patterns Map pattern

Key element

Transformer operation: Any operation that can perform the transformation for a data item. UnaryTransformer: Any C++ callable entity that takes a data item and returns the transformed value.

auto square = [](auto x) { return x∗x; }; auto length = []( const std:: string & s) { return s.lenght() ; };

cbed

27/105

slide-40
SLIDE 40

GrPPI Data patterns Map pattern

Key element

Transformer operation: Any operation that can perform the transformation for a data item. UnaryTransformer: Any C++ callable entity that takes a data item and returns the transformed value.

auto square = [](auto x) { return x∗x; }; auto length = []( const std:: string & s) { return s.lenght() ; };

MultiTransformer: Any C++ callable entity that takes multiple data items and return the transformed vaue.

auto normalize = [](double x, double y) { return sqrt(x∗x+y∗y); }; auto min = []( int x, int y, int z) { return std :: min(x,y,z); }

cbed

27/105

slide-41
SLIDE 41

GrPPI Data patterns Map pattern

Single sequences mapping

Double all elements in sequence

template <typename Execution> std :: vector<double> double_elements(const Execution & ex, const std::vector<double> & v) { std :: vector<double> res(v.size()); grppi :: map(ex, v.begin(), v.end(), res.begin(), []( double x) { return 2∗x; }) ; }

cbed

28/105

slide-42
SLIDE 42

GrPPI Data patterns Map pattern

Multiple sequences mapping

Add two vectors

template <typename Execution> std :: vector<double> add_vectors(const Execution & ex, const std::vector<double> & v1, const std::vector<double> & v2) { auto size = std :: min(v1.size() , v2.size() ); std :: vector<double> res(size); grppi :: map(ex, v1.begin(), v1.end(), res.begin(), []( double x, double y) { return x+y; }, v2.begin()); }

cbed

29/105

slide-43
SLIDE 43

GrPPI Data patterns Map pattern

Multiple sequences mapping

Add three vectors

template <typename Execution> std :: vector<double> add_vectors(const Execution & ex, const std::vector<double> & v1, const std::vector<double> & v2, const std::vector<double> & v3) { auto size = std :: min(v1.size() , v2.size() ); std :: vector<double> res(size); grppi :: map(ex, v1.begin(), v1.end(), res.begin(), []( double x, double y, double z) { return x+y+z; }, v2.begin(), v3.begin()); }

cbed

30/105

slide-44
SLIDE 44

GrPPI Data patterns Map pattern

Heterogeneous mapping

The result can be from a different type. Complex vector from real and imaginary vectors

template <typename Execution> std :: vector<complex<double>> create_cplx(const Execution & ex, const std::vector<double> & re, const std::vector<double> & im) { auto size = std :: min(re.size() , im.size() ); std :: vector<complex<double>> res(size); grppi :: map(ex, re.begin(), re.end(), res.begin(), []( double r, double i) −> complex<double> { return {r,i}; } im.begin()); }

cbed

31/105

slide-45
SLIDE 45

GrPPI Data patterns Reduce pattern

2

Data patterns Map pattern Reduce pattern Map/reduce pattern Stencil pattern

cbed

32/105

slide-46
SLIDE 46

GrPPI Data patterns Reduce pattern

Reductions on data sequences

A reduce pattern combines all values in a data set using a binary combination operation. Given:

A sequence x1, x2, . . . , xN ∈ T. An identity value id ∈ I. A combine operation c : I × T → I

c(c(x, y), z) ≡ c(x, c(y, z)) c(id, x) = ¯ x, where ¯ x is the value of x in I. c(id, c(id, x)) = c(id, x) c(c(c(id, x), y), c(c(id, z), t)) = c(c(c(c(id, x), y), z), t)

It generates the value:

c(. . . c(c(id, x1), x2) . . . , xN)

cbed

33/105

slide-47
SLIDE 47

GrPPI Data patterns Reduce pattern

Reductions on data sequences

cbed

34/105

slide-48
SLIDE 48

GrPPI Data patterns Reduce pattern

Homogeneous reduction

Add a sequence of values

template <typename Execution> double add_sequence(const Execution & ex, const vector<double> & v) { return grppi :: reduce(ex, v.begin(), v.end(), 0.0, []( double x, double y) { return x+y; }) ; }

cbed

35/105

slide-49
SLIDE 49

GrPPI Data patterns Reduce pattern

Heterogeneous reduction

Add lengths of sequence of strings

template <typename Execution> int add_lengths(const Execution & ex, const std::vector<std::string> & words) { return grppi :: reduce(words.begin(), words.end(), 0, []( int n, std :: string w) { return n + w.length(); }) ; }

cbed

36/105

slide-50
SLIDE 50

GrPPI Data patterns Map/reduce pattern

2

Data patterns Map pattern Reduce pattern Map/reduce pattern Stencil pattern

cbed

37/105

slide-51
SLIDE 51

GrPPI Data patterns Map/reduce pattern

Map/reduce pattern

A map/reduce pattern combines a map pattern and a reduce pattern into a single pattern.

1 One or more data sets are mapped applying a

transformation operation.

2 The results are combined by a reduction operation.

A map/reduce could be also expressed by the composition of a map and a reduce.

However, map/reduce may potentially fuse both stages, allowing for extra optimizations.

cbed

38/105

slide-52
SLIDE 52

GrPPI Data patterns Map/reduce pattern

Map/reduce with single data set

A map/reduce on a single input sequence producing a value.

cbed

39/105

slide-53
SLIDE 53

GrPPI Data patterns Map/reduce pattern

Map/reduce with single data set

A map/reduce on a single input sequence producing a value. Given:

A sequence x1, x2, . . . xN ∈ T A mapping function m : T → R A reduction identity value id ∈ I. A combine operation c : I × R → I

cbed

39/105

slide-54
SLIDE 54

GrPPI Data patterns Map/reduce pattern

Map/reduce with single data set

A map/reduce on a single input sequence producing a value. Given:

A sequence x1, x2, . . . xN ∈ T A mapping function m : T → R A reduction identity value id ∈ I. A combine operation c : I × R → I

It generates a value reducing the mapping:

c(c(c(id, m1), m2), . . . , mM) Where mk = m(xk)

cbed

39/105

slide-55
SLIDE 55

GrPPI Data patterns Map/reduce pattern

Map/reduce pattern

cbed

40/105

slide-56
SLIDE 56

GrPPI Data patterns Map/reduce pattern

Single sequence map/reduce

Sum of squares

template <typename Execution> double sum_squares(const Execution & ex, const std::vector<double> & v) { return grppi :: map_reduce(ex, v.begin(), v.end(), 0.0, []( double x) { return x∗x; } []( double x, double y) { return x+y; } ); }

cbed

41/105

slide-57
SLIDE 57

GrPPI Data patterns Map/reduce pattern

Map/reduce in multiple data sets

A map/reduce on multiple input sequences producing a single value.

cbed

42/105

slide-58
SLIDE 58

GrPPI Data patterns Map/reduce pattern

Map/reduce in multiple data sets

A map/reduce on multiple input sequences producing a single value. Given:

A sequence x1

1, x1 2, . . . x1 N ∈ T1

A sequence x2

1, x2 2, . . . x2 N ∈ T2

. . . A sequence xM

1 , xM 2 , . . . xM N ∈ TM

A mapping function m : T1 × T2 × . . . × TM → R A reduction identity value id ∈ I. A combine operation c : I × R → I

cbed

42/105

slide-59
SLIDE 59

GrPPI Data patterns Map/reduce pattern

Map/reduce in multiple data sets

A map/reduce on multiple input sequences producing a single value. Given:

A sequence x1

1, x1 2, . . . x1 N ∈ T1

A sequence x2

1, x2 2, . . . x2 N ∈ T2

. . . A sequence xM

1 , xM 2 , . . . xM N ∈ TM

A mapping function m : T1 × T2 × . . . × TM → R A reduction identity value id ∈ I. A combine operation c : I × R → I

It generates a value reducing the mapping:

c(c(c(id, m1), m2), . . . , mM) Where mk = m(xk

1 , xk 2 , . . . , xk N) cbed

42/105

slide-60
SLIDE 60

GrPPI Data patterns Map/reduce pattern

Map/reduce on two data sets

Scalar product

template <typename Execution> double scalar_product(const Execution & ex, const std::vector<double> & v1, const std::vector<double> & v2) { return grppi :: map_reduce(ex, begin(v1), end(v1), 0.0, []( double x, double y) { return x∗y; }, []( double x, double y) { return x+y; }, v2.begin()); }

cbed

43/105

slide-61
SLIDE 61

GrPPI Data patterns Map/reduce pattern

Cannonical map/reduce

Given a sequence of words, produce a container where:

The key is the word. The value is the number of occurrences of that word.

cbed

44/105

slide-62
SLIDE 62

GrPPI Data patterns Map/reduce pattern

Cannonical map/reduce

Given a sequence of words, produce a container where:

The key is the word. The value is the number of occurrences of that word.

Word frequencies

template <typename Execution> auto word_freq(const Execution & ex, const std::vector<std::string> & words) { using namespace std; using dictionary = std :: map<string,int>; return grppi :: map_reduce(ex, words.begin(), words.end(), dictionary{}, []( string w) −> dictionary { return {w,1}; } []( dictionary & lhs, const dictionary & rhs) −> dictionary { for (auto & entry : rhs) { lhs[entry. first ] += entry.second; } return lhs; }) ; }

cbed

44/105

slide-63
SLIDE 63

GrPPI Data patterns Stencil pattern

2

Data patterns Map pattern Reduce pattern Map/reduce pattern Stencil pattern

cbed

45/105

slide-64
SLIDE 64

GrPPI Data patterns Stencil pattern

Stencil pattern

A stencil pattern applies a transformation to every element in one or multiple data sets, generating a new data set as an output

The transformation is function of a data item and its neighbourhood.

cbed

46/105

slide-65
SLIDE 65

GrPPI Data patterns Stencil pattern

Stencil with single data set

A stencil on a single input sequence producing an output sequence.

cbed

47/105

slide-66
SLIDE 66

GrPPI Data patterns Stencil pattern

Stencil with single data set

A stencil on a single input sequence producing an output sequence. Given:

A sequence x1, x2, . . . , xN ∈ T A neighbourhood function n : I → N A transformation function f : I × N → U

cbed

47/105

slide-67
SLIDE 67

GrPPI Data patterns Stencil pattern

Stencil with single data set

A stencil on a single input sequence producing an output sequence. Given:

A sequence x1, x2, . . . , xN ∈ T A neighbourhood function n : I → N A transformation function f : I × N → U

It generates the sequence:

f(n(x1)), f(n(x2)), . . . , f(n(xN))

cbed

47/105

slide-68
SLIDE 68

GrPPI Data patterns Stencil pattern

Stencil pattern

cbed

48/105

slide-69
SLIDE 69

GrPPI Data patterns Stencil pattern

Single sequence stencil

Neighbour average

template <typename Execution> std :: vector<double> neib_avg(const Execution & ex, const std::vector<double> & v) { std :: vector<double> res(v.size()); grppi :: stencil (ex, v.begin(), v.end(), []( auto it , auto n) { return ∗ it + accumulate(begin(n), end(n)); }, [&](auto it ) { vector<double> r; if ( it !=begin(v)) r.push_back(∗prev(it)); if (distance( it ,end(end))>1) r.push_back(∗next(it)); return r; }) ; return res; }

cbed

49/105

slide-70
SLIDE 70

GrPPI Data patterns Stencil pattern

Stencil with multiple data sets

A stencil on multiple input sequences producing an output sequence.

cbed

50/105

slide-71
SLIDE 71

GrPPI Data patterns Stencil pattern

Stencil with multiple data sets

A stencil on multiple input sequences producing an output sequence. Given:

A sequence x1

1, x1 2, . . . , x1 N ∈ T1

A sequence x2

1, x2 2, . . . , x2 N ∈ T1

. . . A sequence xM

1 , xM 2 , . . . , xM N ∈ T1

A neighbourhood function n : I1 × I2 × IM → N A transformation function f : I1 × N → U

cbed

50/105

slide-72
SLIDE 72

GrPPI Data patterns Stencil pattern

Stencil with multiple data sets

A stencil on multiple input sequences producing an output sequence. Given:

A sequence x1

1, x1 2, . . . , x1 N ∈ T1

A sequence x2

1, x2 2, . . . , x2 N ∈ T1

. . . A sequence xM

1 , xM 2 , . . . , xM N ∈ T1

A neighbourhood function n : I1 × I2 × IM → N A transformation function f : I1 × N → U

It generates the sequence:

f(n(x1)), f(n(x2)), . . . , f(n(xN))

cbed

50/105

slide-73
SLIDE 73

GrPPI Data patterns Stencil pattern

Multiple sequences stencil

Neighbour average

template <typename It> std :: vector<double> get_around(It i, It first , It last ) { std :: vector<double> r; if ( i!= first ) r.push_back(∗std::prev(i)); if (std :: distance(i , last )>1) r.push_back(∗std::next(i)); } template <typename Execution> std :: vector<double> neib_avg(const Execution & ex, const std::vector<double> & v1, const std::vector<double> & v2) { std :: vector<double> res(std::min(v1.size(),v2.size() )); grppi :: stencil (ex, v.begin(), v.end(), []( auto it , auto n) { return ∗it + accumulate(begin(n), end(n)); }, [&](auto it , auto it2) { vector<double> r = get_around(it1, v1.begin(), v1.end()); vector<double> r2 = get_around(it2, v2.begin(), v2.end()); copy(r2.begin(), r2.end(), back_inserter(r)); return r; }, v2.begin()); return res; }

cbed

51/105

slide-74
SLIDE 74

GrPPI Task Patterns

1

Introduction

2

Data patterns

3

Task Patterns

4

Streaming patterns

5

Writing your own execution

6

Evaluation

7

Conclusions

cbed

52/105

slide-75
SLIDE 75

GrPPI Task Patterns Divide/conquer pattern

3

Task Patterns Divide/conquer pattern

cbed

53/105

slide-76
SLIDE 76

GrPPI Task Patterns Divide/conquer pattern

Divide/conquer pattern

A divide/conquer pattern splits a problem into two or more independent subproblems until a base case is reached.

The base case is solved directly. The results of the subproblems are combined until the final solution of the original problem is obtained.

cbed

54/105

slide-77
SLIDE 77

GrPPI Task Patterns Divide/conquer pattern

Divide/conquer pattern

A divide/conquer pattern splits a problem into two or more independent subproblems until a base case is reached.

The base case is solved directly. The results of the subproblems are combined until the final solution of the original problem is obtained.

Key elements:

Divider: Divides a problem in a set of subproblems. Solver: Solves and individual subproblem. Combiner: Combines two solutions.

cbed

54/105

slide-78
SLIDE 78

GrPPI Task Patterns Divide/conquer pattern

Divide/conquer pattern

cbed

55/105

slide-79
SLIDE 79

GrPPI Task Patterns Divide/conquer pattern

A patterned merge/sort

Ranges on vectors

struct range { range(std::vector<double> & v) : first {v.begin() }, last {v.end()} {} auto size() const { return std :: distance( first , last ); } std :: vector<double> first, last ; }; std :: vector<range> divide(range r) { auto mid = r. first + r.size() / 2; return { {r. first , mid}, {mid, r. last } }; }

cbed

56/105

slide-80
SLIDE 80

GrPPI Task Patterns Divide/conquer pattern

A patterned merge/sort

Ranges on vectors

template <typename Execution> void merge_sort(const Execution & ex, std::vector<double> & v) { grppi :: divide_conquer(exec, range(v), []( auto r) −> vector<range> { if (1>=r.size() ) return {r }; else return divide(r); }, []( auto x) { return x; }, []( auto r1, auto r2) { std :: inplace_merge(r1.first , r1. last , r2. last ); return range{r1. first , r2. last }; }) ; }

cbed

57/105

slide-81
SLIDE 81

GrPPI Streaming patterns

1

Introduction

2

Data patterns

3

Task Patterns

4

Streaming patterns

5

Writing your own execution

6

Evaluation

7

Conclusions

cbed

58/105

slide-82
SLIDE 82

GrPPI Streaming patterns Pipeline pattern

4

Streaming patterns Pipeline pattern Execution policies and pipelines Farm stages Filtering stages Reductions in pipelines Iterations in pipelines

cbed

59/105

slide-83
SLIDE 83

GrPPI Streaming patterns Pipeline pattern

Pipeline pattern

A pipeline pattern allows processing a data stream where the computation may be divided in multiple stages

Each stage processes the data item generated in the previous stage and passes the produced result to the next stage

cbed

60/105

slide-84
SLIDE 84

GrPPI Streaming patterns Pipeline pattern

Standalone pipeline

A standalone pipeline is a top-level pipeline.

Invoking the pipeline translates into its execution.

cbed

61/105

slide-85
SLIDE 85

GrPPI Streaming patterns Pipeline pattern

Standalone pipeline

A standalone pipeline is a top-level pipeline.

Invoking the pipeline translates into its execution.

Given:

A generater g : ∅ → T1 ∪ ∅ A sequence of transformers ti : Ti → Ti+1

cbed

61/105

slide-86
SLIDE 86

GrPPI Streaming patterns Pipeline pattern

Standalone pipeline

A standalone pipeline is a top-level pipeline.

Invoking the pipeline translates into its execution.

Given:

A generater g : ∅ → T1 ∪ ∅ A sequence of transformers ti : Ti → Ti+1

For every non-empty value generated by g, it evaluates:

fn(fn−1(. . . f1(g())))

cbed

61/105

slide-87
SLIDE 87

GrPPI Streaming patterns Pipeline pattern

Generators

A generator g is any callable C++ entity that:

Takes no argument. Returns a value of type T that may hold (or not) a value. Null value signals end of stream.

cbed

62/105

slide-88
SLIDE 88

GrPPI Streaming patterns Pipeline pattern

Generators

A generator g is any callable C++ entity that:

Takes no argument. Returns a value of type T that may hold (or not) a value. Null value signals end of stream.

The return value must be any type that:

Is copy-constructible or move-constructible.

T x = g();

cbed

62/105

slide-89
SLIDE 89

GrPPI Streaming patterns Pipeline pattern

Generators

A generator g is any callable C++ entity that:

Takes no argument. Returns a value of type T that may hold (or not) a value. Null value signals end of stream.

The return value must be any type that:

Is copy-constructible or move-constructible.

T x = g();

Is contextually convertible to bool

if (x) { /∗ ... ∗/ } if (!x) { /∗ ... ∗/ }

cbed

62/105

slide-90
SLIDE 90

GrPPI Streaming patterns Pipeline pattern

Generators

A generator g is any callable C++ entity that:

Takes no argument. Returns a value of type T that may hold (or not) a value. Null value signals end of stream.

The return value must be any type that:

Is copy-constructible or move-constructible.

T x = g();

Is contextually convertible to bool

if (x) { /∗ ... ∗/ } if (!x) { /∗ ... ∗/ }

Can be derreferenced

auto val = ∗x;

cbed

62/105

slide-91
SLIDE 91

GrPPI Streaming patterns Pipeline pattern

Generators

A generator g is any callable C++ entity that:

Takes no argument. Returns a value of type T that may hold (or not) a value. Null value signals end of stream.

The return value must be any type that:

Is copy-constructible or move-constructible.

T x = g();

Is contextually convertible to bool

if (x) { /∗ ... ∗/ } if (!x) { /∗ ... ∗/ }

Can be derreferenced

auto val = ∗x;

The standard library offers an excellent candidate std::experimental::optional<T>.

cbed

62/105

slide-92
SLIDE 92

GrPPI Streaming patterns Pipeline pattern

Simple pipeline

x -> x*x -> 1/x -> print

template <typename Execution> void run_pipe(const Execution & ex, int n) { grppi :: pipeline(ex, [ i=0,max=n] () mutable −> optional<int> { if ( i<max) return i; else return {}; }, []( int x) −> double { return x∗x; }, []( double x) { return 1/x; }, []( double x) { cout << x << "\n"; } ); }

cbed

63/105

slide-93
SLIDE 93

GrPPI Streaming patterns Pipeline pattern

Nested pipelines

Pipelines may be nested. An inner pipeline:

Does not take an execution policy. All stages are transformers (no generator). The last stage must also produce values.

The inner pipeline uses the same execution policy than the

  • uter pipeline.

cbed

64/105

slide-94
SLIDE 94

GrPPI Streaming patterns Pipeline pattern

Nested pipelines

x -> x*x -> 1/x -> print

template <typename Execution> void run_pipe(const Execution & ex, int n) { grppi :: pipeline(ex, [ i=0,max=n] () mutable −> optional<int> { if ( i<max) return i; else return {}; }, grppi :: pipeline( []( int x) −> double { return x∗x; }, []( double x) { return 1/x; }) , []( double x) { cout << x << "\n"; } ); }

cbed

65/105

slide-95
SLIDE 95

GrPPI Streaming patterns Pipeline pattern

Piecewise pipelines

A pipeline can be piecewise created. x -> x*x -> 1/x -> print

template <typename Execution> void run_pipe(const Execution & ex, int n) { auto generator = [i=0,max=n] () mutable −> optional<int> { if ( i<max) return i; else return {}; }; auto inner = grppi :: pipeline( []( int x) −> double { return x∗x; }, []( double x) { return 1/x; }) ; auto printer = []( double x) { cout << x << "\n"; }; grppi :: pipeline(ex, generator, inner, printer ); }

cbed

66/105

slide-96
SLIDE 96

GrPPI Streaming patterns Execution policies and pipelines

4

Streaming patterns Pipeline pattern Execution policies and pipelines Farm stages Filtering stages Reductions in pipelines Iterations in pipelines

cbed

67/105

slide-97
SLIDE 97

GrPPI Streaming patterns Execution policies and pipelines

Ordering

Signals if pipeline items must be consumed in the same

  • rder they were produced.

Do they need to be time-stamped?

Default is ordered. API

ex.enable_ordering() ex.disable_ordering() bool o = ex.is_ordered()

cbed

68/105

slide-98
SLIDE 98

GrPPI Streaming patterns Execution policies and pipelines

Queueing properties

Some policies (native and omp) use queues to communicate pipeline stages. Properties:

Queue size: Buffer size of the queue. Mode: blocking versus lock-free.

API

ex.set_queue_attributes(100, mode::blocking)

cbed

69/105

slide-99
SLIDE 99

GrPPI Streaming patterns Farm stages

4

Streaming patterns Pipeline pattern Execution policies and pipelines Farm stages Filtering stages Reductions in pipelines Iterations in pipelines

cbed

70/105

slide-100
SLIDE 100

GrPPI Streaming patterns Farm stages

Farm pattern

A farm is a streaming pattern applicable to a stage in a pipeline, providing multiple tasks to process data items from a data stream

A farm has an associated cardinality which is the number

  • f parallel tasks used to serve the stage

cbed

71/105

slide-101
SLIDE 101

GrPPI Streaming patterns Farm stages

Farms in pipelines

Square values

template <typename Execution> void run_pipe(const Execution & ex, int n) { grppi :: pipeline(ex, [ i=0,max=n] () mutable −> optional<int> { if ( i<max) return i; else return {}; }, grppi :: farm(4 []( int x) −> double { return x∗x; }), []( double x) { cout << x << "\n"; } ); }

cbed

72/105

slide-102
SLIDE 102

GrPPI Streaming patterns Farm stages

Piecewise farms

Square values

template <typename Execution> void run_pipe(const Execution & ex, int n) { auto inner = grppi :: farm(4 []( int x) −> double { return x∗x; }); grppi :: pipeline(ex, [ i=0,max=n] () mutable −> optional<int> { if ( i<max) return i; else return {}; }, inner, []( double x) { cout << x << "\n"; } ); }

cbed

73/105

slide-103
SLIDE 103

GrPPI Streaming patterns Filtering stages

4

Streaming patterns Pipeline pattern Execution policies and pipelines Farm stages Filtering stages Reductions in pipelines Iterations in pipelines

cbed

74/105

slide-104
SLIDE 104

GrPPI Streaming patterns Filtering stages

Filter pattern

A filter pattern discards (or keeps) the data items from a data stream based on the outcome of a predicate.

cbed

75/105

slide-105
SLIDE 105

GrPPI Streaming patterns Filtering stages

Filter pattern

A filter pattern discards (or keeps) the data items from a data stream based on the outcome of a predicate This pattern can be used only as a stage of a pipeline Alternatives:

Keep: Only data items satisfying the predicate are sent to the next stage Discard: Only data items not satisfying the predicate are sent to the next stage

cbed

76/105

slide-106
SLIDE 106

GrPPI Streaming patterns Filtering stages

Filtering in

Print primes

bool is_prime(int n); template <typename Execution> void print_primes(const Execution & ex, int n) { grppi :: pipeline(exec, [ i=0,max=n]() mutable −> optional<int> { if ( i<=n) return i++; else return {}; }, grppi :: keep(is_prime), []( int x) { cout << x << "\n"; } ); }

cbed

77/105

slide-107
SLIDE 107

GrPPI Streaming patterns Filtering stages

Filtering out

Discard words

template <typename Execution> void print_primes(const Execution & ex, std::istream & is) { grppi :: pipeline(exec, [& file ]() −> optional<string> { string word; file >> word; if (! file ) { return {}; } else { return word; } }, grppi :: discard ([]( std :: string w) { return w.length() < 4; }, []( std :: string w) { cout << x << "\n"; } ); }

cbed

78/105

slide-108
SLIDE 108

GrPPI Streaming patterns Reductions in pipelines

4

Streaming patterns Pipeline pattern Execution policies and pipelines Farm stages Filtering stages Reductions in pipelines Iterations in pipelines

cbed

79/105

slide-109
SLIDE 109

GrPPI Streaming patterns Reductions in pipelines

Stream reduction pattern

A stream reduction pattern performs a reduction over the items of a subset of a data stream

cbed

80/105

slide-110
SLIDE 110

GrPPI Streaming patterns Reductions in pipelines

Stream reduction pattern

A stream reduction pattern performs a reduction over the items of a subset of a data stream Key elements

window-size: Number of elements in a window reduction

  • ffset: Distance between the begin of two consecutive

windows identity: Initial value used for reductions combiner: Operation used for reductions

cbed

81/105

slide-111
SLIDE 111

GrPPI Streaming patterns Reductions in pipelines

Windowed reductions

Chunked sum

template <typename Execution> void print_primes(const Execution & ex, int n) { grppi :: pipeline(exec, [ i=0,max=n]() mutable −> optional<double> { if ( i<=n) return i++; else return {}; }, grppi :: reduce(100, 50, 0.0, []( double x, double y) { return x+y; }) , []( int x) { cout << x << "\n"; } ); }

cbed

82/105

slide-112
SLIDE 112

GrPPI Streaming patterns Iterations in pipelines

4

Streaming patterns Pipeline pattern Execution policies and pipelines Farm stages Filtering stages Reductions in pipelines Iterations in pipelines

cbed

83/105

slide-113
SLIDE 113

GrPPI Streaming patterns Iterations in pipelines

Stream iteration pattern

A stream iteration pattern allows loops in data stream processing.

An operation is applied to a data item until a predicate is satisfied. When the predicate is met, the result is sent to the output stream.

cbed

84/105

slide-114
SLIDE 114

GrPPI Streaming patterns Iterations in pipelines

Stream iteration pattern

A stream iteration pattern allows loops in data stream processing.

An operation is applied to a data item until a predicate is satisfied. When the predicate is met, the result is sent to the output stream.

Key elements:

A transformer that is applied to a data item on each iteration. A predicate to determine when the iteration has finished.

cbed

84/105

slide-115
SLIDE 115

GrPPI Streaming patterns Iterations in pipelines

Iterating

Print values 2n ∗ x

template <typename Execution> void print_values(const Execution & ex, int n) { auto generator = [i=1,max=n+1]() mutable −> optional<int> { if ( i<max) return i++; else return {}; }; grppi :: pipeline(ex, generator, grppi :: repeat_until( []( int x) { return 2∗x; }, []( int x) { return x>1024; } ), []( int x) { cout << x << endl; } ); }

cbed

85/105

slide-116
SLIDE 116

GrPPI Writing your own execution

1

Introduction

2

Data patterns

3

Task Patterns

4

Streaming patterns

5

Writing your own execution

6

Evaluation

7

Conclusions

cbed

86/105

slide-117
SLIDE 117

GrPPI Writing your own execution

Addine a new policy

Adding a new execution policy is done by writing a new class.

No inheritance needed.

“Inheritance is the base class of all evils” (Sean Parent).

No dependency from the library. Additionally configure some meta-functions (until we have concepts).

cbed

87/105

slide-118
SLIDE 118

GrPPI Writing your own execution

My custom execution

my_execution

class my_execution { my_execution() noexcept; void set_concurrency_degree(int n) const noexcept; void concurrency_degree() const noexcept; void enable_ordering() noexcept; void disable_ordering() noexcept; bool is_ordered() const noexcept; // ... }; template <> constexpr bool is_supported<my_execution>() { return true; }

cbed

88/105

slide-119
SLIDE 119

GrPPI Writing your own execution

Adding a pattern

my_execution::map

class my_execution { // ... template <typename ... InputIterators, typename OutputIterator, typename Transformer> constexpr void map(std::tuple<InputIterators...> firsts , OutputIterator first_out , std :: size_t sequence_size, Transformer && transform_op) const; // ... }; template <> constexpr bool supports_map<my_execution>() { return true; }

cbed

89/105

slide-120
SLIDE 120

GrPPI Writing your own execution

Some helpers in the library

Applying a function to a tuple of iterators

template <typename F, typename ... Iterators, template <typename ...> class T> decltype(auto) apply_deref_increment( F && f, T<Iterators ...> & iterators )

Takes a function f and a tuple of iterators (e.g. result of make_tuple(it1, it2, it3). Returns f(*it1++, *it2++, *it3++). Very convenient for implementing data patterns. More like this in include/common/iterator.h.

cbed

90/105

slide-121
SLIDE 121

GrPPI Writing your own execution

Implementing map

map

template <typename ... InputIterators, typename OutputIterator, typename Transformer> void my_execution_native::map(std::tuple<InputIterators...> firsts , OutputIterator first_out , std :: size_t sequence_size, Transformer transform_op) const { using namespace std; auto process_chunk = [&transform_op](auto fins, std::size_t size, auto fout) { const auto l = next(get<0>(fins), size); while (get<0>(fins)!=l ) { ∗fout++ = apply_deref_increment( std :: forward<Transformer>(transform_op), fins); } }; // ...

cbed

91/105

slide-122
SLIDE 122

GrPPI Writing your own execution

Implementing map

map

// ... const int chunk_size = sequence_size / concurrency_degree_; { some_worker_pool workers; for (int i=0; i!=concurrency_degree_−1; ++i) { const auto delta = chunk_size ∗ i; const auto chunk_firsts = iterators_next( firsts ,delta); const auto chunk_first_out = next( first_out , delta); workers.launch(process_chunk, chunk_firsts, chunk_size, chunk_first_out); } const auto delta = chunk_size ∗ (concurrency_degree_ − 1); const auto chunk_firsts = iterators_next( firsts ,delta); const auto chunk_first_out = next( first_out , delta); process_chunk(chunk_firsts, sequence_size − delta, chunk_first_out); } // Implicit pool synch }

cbed

92/105

slide-123
SLIDE 123

GrPPI Evaluation

1

Introduction

2

Data patterns

3

Task Patterns

4

Streaming patterns

5

Writing your own execution

6

Evaluation

7

Conclusions

cbed

93/105

slide-124
SLIDE 124

GrPPI Evaluation

Evaluation

Plataform:

2 × Intel Xeon Ivy Bridge E5-2695 v2. Total number of cores: 24. Clock frequency: 2.40 GHz. L3 cache size: 30 MB. Main memory: 128 GB DDR3. OS: Ubuntu Linux 14.04 LTS, kernel 3.13.

Software:

Compiler: GCC 6.2. OpenMP 4.0: included in GCC. ISO C++ Threads: included in the C++ STL. Intel TBB: www.threadingbuildingblocks.org

cbed

94/105

slide-125
SLIDE 125

GrPPI Evaluation

Use case

Video processing application for detecting edges using the filters:

Gaussian Blur Sobel operator

It uses a pipeline pattern:

S1: Reading frames from a camera S2: Apply the Gaussian Blur filter (it can use a farm) S3: Apply the Sobel operator (it can use a farm) S4: Writing frames into a file

Parallel variants:

using the back ends directly using

cbed

95/105

slide-126
SLIDE 126

GrPPI Evaluation

Pipeline compositions

Pipeline+farm compositions made in the video application:

| | |

(a) Non-composed Pipeline. (b) Pipeline ( s | f | s | s ). (c) Pipeline ( s | s | f | s ). (d) Pipeline ( s | f | f | s ). cbed

96/105

slide-127
SLIDE 127

GrPPI Evaluation

Usability of

Pipeline % of increase of lines of code w.r.t sequential composition C++ Threads OpenMP Intel TBB ( p | p | p | p ) +8.8 % +13.0 % +25.9 % +1.8 % ( p | f | p | p ) +59.4 % +62.6 % +25.9 % +3.1 % ( p | p | f | p ) +60.0 % +63.9 % +25.9 % +3.1 % ( p | f | f | p ) +106.9 % +109.4 % +25.9 % +4.4 %

cbed

97/105

slide-128
SLIDE 128

GrPPI Evaluation

Performance: frames per second

1 4 16 64 480p 720p 1080p 1440p 2160p Frames per second Pipeline (p|p|p|p) 1 4 16 64 480p 720p 1080p 1440p 2160p Pipeline (p|f|p|p) 1 4 16 64 480p 720p 1080p 1440p 2160p Frames per second Video resolution Pipeline (p|p|f|p) 1 4 16 64 256 480p 720p 1080p 1440p 2160p Video resolution Pipeline (p|f|f|p) C++11 GrPPI C++11 OpenMP GrPPI OpenMP TBB GrPPI TBB

cbed

98/105

slide-129
SLIDE 129

GrPPI Evaluation

Observations

Using farm for both stages leads an improved FPS rate. Using farm for only one stage does not any bring significant improvement. Impact of on performance

Negligible overheads of about 2%

Impact on programming efforts

Significant less efforts with respect to other programming models

cbed

99/105

slide-130
SLIDE 130

GrPPI Conclusions

1

Introduction

2

Data patterns

3

Task Patterns

4

Streaming patterns

5

Writing your own execution

6

Evaluation

7

Conclusions

cbed

100/105

slide-131
SLIDE 131

GrPPI Conclusions

Summary

An unified programming model for sequential and parallel modes. Multiple back-ends available. Current pattern set:

Data: map, reduce, map/reduce, stencil. Task: divide/conquer. Streaming: pipeline with nesting of farm, filter, reduction, iteration.

Current limitation:

Pipelines cannot be nested inside other patterns (e.g. iteration of a pipeline).

cbed

101/105

slide-132
SLIDE 132

GrPPI Conclusions

Future work

Integrate additional backends (e.g. FastFlow, CUDA). Eliminate metaprogramming by using Concepts. Extend and simplify the interface for data patterns. Support multi-context patterns. Better support of NUMA for native back-end. More patterns. More applications.

cbed

102/105

slide-133
SLIDE 133

GrPPI Conclusions

Recent publications

A Generic Parallel Pattern Interface for Stream and Data Processing. D. del Rio, M. F . Dolz, J. Fernández, J. D. García. Concurrency and Computation: Practice and Experience. 2017. Supporting Advanced Patterns in GrPPI: a Generic Parallel Pattern

  • Interface. D. R. del Astorga, M. F. Dolz, J. Fernandez, and J. D. Garcia,

Auto-DaSP 2017 (Euro-Par 2017). Probabilistic-Based Selection of Alternate Implementations for Heterogeneous Platforms. J. Fernandez, A. Sanchez, D. del Río, M. F . Dolz, J. D Garcia. ICA3PP 2017. 2017. A C++ Generic Parallel Pattern Interface for Stream Processing. D. del Río,

  • M. F

. Dolz, L. M. Sanchez, J. Garcia-Blas and J. D. Garcia. ICA3PP 2016. Finding parallel patterns through static analysis in C++ applications. D. R. del Astorga, M. F . Dolz, L. M. Sanchez, J. D. Garcia, M. Danelutto, and M. Torquati, International Journal of High Performance Computing Applications, 2017.

cbed

103/105

slide-134
SLIDE 134

GrPPI Conclusions

GrPPI https://github.com/arcosuc3m/grppi

cbed

104/105

slide-135
SLIDE 135

GrPPI Conclusions

GrPPI

Generic Reusable Parallel Patterns Interface

ARCOS Group University Carlos III of Madrid Spain

January 2018

cbed

105/105