A Middleware for Concurrent Programming in MPI Applications Tobias - - PowerPoint PPT Presentation

a middleware for concurrent programming in mpi
SMART_READER_LITE
LIVE PREVIEW

A Middleware for Concurrent Programming in MPI Applications Tobias - - PowerPoint PPT Presentation

Introduction Programming Model The MPI Threads API Summary & Conclusions A Middleware for Concurrent Programming in MPI Applications Tobias Berka, Helge Hagenauer and Marian Vajter sic September 13, 2011 1 / 26 Introduction


slide-1
SLIDE 1

Introduction Programming Model The MPI Threads API Summary & Conclusions

A Middleware for Concurrent Programming in MPI Applications

Tobias Berka, Helge Hagenauer and Marian Vajterˇ sic September 13, 2011

1 / 26

slide-2
SLIDE 2

Introduction Programming Model The MPI Threads API Summary & Conclusions

Outline

1 Introduction

Emergent Parallel Applications The Need for Concurrency

2 Programming Model

Concurrency using Threads Thread Collectives In Actual Use

3 The MPI Threads API

The MPIT Interface Definition Constructs and Features Performance Overhead

4 Summary & Conclusions

2 / 26

slide-3
SLIDE 3

Introduction Programming Model The MPI Threads API Summary & Conclusions Emergent Parallel Applications The Need for Concurrency

Introduction

3 / 26

slide-4
SLIDE 4

Introduction Programming Model The MPI Threads API Summary & Conclusions Emergent Parallel Applications The Need for Concurrency

Emergent Parallel Applications

Parallelism is abundant in today’s data centers:

Multi-core CPUs, High-bandwidth low-latency interconnection networks, Accelerator hardware.

Exciting new applications in today’s information economy:

Information retrieval (i.e. search), Online analytical processing, Recommender systems, Data mining.

4 / 26

slide-5
SLIDE 5

Introduction Programming Model The MPI Threads API Summary & Conclusions Emergent Parallel Applications The Need for Concurrency

Use Case: Parallel Search Engine

Requirements beyond the classic batch-job operation:

Add Document Update Document Remove Document Query Documents Data

5 / 26

slide-6
SLIDE 6

Introduction Programming Model The MPI Threads API Summary & Conclusions Emergent Parallel Applications The Need for Concurrency

Use Case: Parallel Search Engine

We group similar operations – short and long:

Add Document Update Document Remove Document Query Documents Data Maintenance Layer Query Layer

6 / 26

slide-7
SLIDE 7

Introduction Programming Model The MPI Threads API Summary & Conclusions Emergent Parallel Applications The Need for Concurrency

The Need for Concurrency

Multi-User Operation

Multiple users, Single back-end, Single data base... ⇒ We need concurrency!

Both layers can be used concurrently,

At the same time:

Answer queries, Modify the data base,

⇒ We need concurrency!

7 / 26

slide-8
SLIDE 8

Introduction Programming Model The MPI Threads API Summary & Conclusions Concurrency using Threads Thread Collectives In Actual Use

Programming Model

8 / 26

slide-9
SLIDE 9

Introduction Programming Model The MPI Threads API Summary & Conclusions Concurrency using Threads Thread Collectives In Actual Use

How do we implement concurrent activities?

Operations and queues:

Data structure to describe operations, Queue holds operations, “Main loop” pops operations and processes them.

Threads:

One activity = one thread.

9 / 26

slide-10
SLIDE 10

Introduction Programming Model The MPI Threads API Summary & Conclusions Concurrency using Threads Thread Collectives In Actual Use

The pros and cons...

Operations and queues:

+ Efficient,

  • No true concurrency,
  • Cannot process operation and receive independent messages,

Threads:

  • Context switching overhead,
  • Shared data requires locking,

+ Very tidy abstraction, + Compositional (can always add more threads).

10 / 26

slide-11
SLIDE 11

Introduction Programming Model The MPI Threads API Summary & Conclusions Concurrency using Threads Thread Collectives In Actual Use

Use Case: Parallel Search Engine

Let’s use threads to implement these concurrent activities:

Data Maintenance Layer Query Layer Add Document Update Document Remove Document Query Documents Maintenance Thread Queue Query Thread Queue

11 / 26

slide-12
SLIDE 12

Introduction Programming Model The MPI Threads API Summary & Conclusions Concurrency using Threads Thread Collectives In Actual Use

Programming Abstraction

Key abstraction: thread collective, Goals:

Encapsulate concurrent activities, Isolate concurrent communication, Unify and simplify the design.

Conflicting objectives:

Safety and ease of programmability, Performance.

12 / 26

slide-13
SLIDE 13

Introduction Programming Model The MPI Threads API Summary & Conclusions Concurrency using Threads Thread Collectives In Actual Use

Thread Collectives

Creates a new thread within every MPI process ( T1 → T2 ), Assigns a copy of the MPI communicator (C1 → C2),

P1 P2 P3 P4 C1 C2 T1 T2 T1 T2 T1 T2 T1 T2 13 / 26

slide-14
SLIDE 14

Introduction Programming Model The MPI Threads API Summary & Conclusions Concurrency using Threads Thread Collectives In Actual Use

Thread Collectives

Encapsulates computation: thread function(s) for

T2

, Isolates communication: communicator C2.

P1 P2 P3 P4 C1 C2 T1 T2 T1 T2 T1 T2 T1 T2 14 / 26

slide-15
SLIDE 15

Introduction Programming Model The MPI Threads API Summary & Conclusions Concurrency using Threads Thread Collectives In Actual Use

Parallel Search Engine

P1–P4 each hold a part of all documents, T1s: Answer queries (query layer), T2s: Add, remove or update documents (maintenance layer).

P1 P2 P3 P4 C1 C2 T1 T2 T1 T2 T1 T2 T1 T2 15 / 26

slide-16
SLIDE 16

Introduction Programming Model The MPI Threads API Summary & Conclusions Concurrency using Threads Thread Collectives In Actual Use

What do we get?

Simple, ready-made abstraction, Encapsulate & isolate, Compositional, Caveat: synchronization and locking.

16 / 26

slide-17
SLIDE 17

Introduction Programming Model The MPI Threads API Summary & Conclusions The MPIT Interface Definition Constructs and Features Performance Overhead

The MPI Threads API

17 / 26

slide-18
SLIDE 18

Introduction Programming Model The MPI Threads API Summary & Conclusions The MPIT Interface Definition Constructs and Features Performance Overhead

The MPIT Interface Definition

Additional layer of middleware to provide what we need, Designed as a library for compatability (not a new programming lanugage), The “MPI Threads” (MPIT) interface definition, Written as a single C header file (157 physical SLOC1).

1According to David A. Wheeler’s “SLOCCount”. 18 / 26

slide-19
SLIDE 19

Introduction Programming Model The MPI Threads API Summary & Conclusions The MPIT Interface Definition Constructs and Features Performance Overhead

Constructs and Features

Thread collectives,

One thread within every MPI process, Separate MPI communicator,

Conventional threads,

Portable thread interface, We get it “for free” – we have all of the machinery.

Process-local synchronization constructs,

Mutex locks, condition variables, semaphores and barriers, Specifies reliable semantics.

19 / 26

slide-20
SLIDE 20

Introduction Programming Model The MPI Threads API Summary & Conclusions The MPIT Interface Definition Constructs and Features Performance Overhead

Performance Overhead

Additional layer of indirection (1 additional function call), Additional error and consistency checks:

Condition variable checks spurious wake-up, Barrier verifies thread identity.

⇒ Execution time overhead. MPIT prototype implemented on top of POSIX threads (2,650 physical SLOC2).

2According to David A. Wheeler’s “SLOCCount”. 20 / 26

slide-21
SLIDE 21

Introduction Programming Model The MPI Threads API Summary & Conclusions The MPIT Interface Definition Constructs and Features Performance Overhead

Thread Creation

Difference: additional indirection. 0.02 0.04 0.06 0.08 0.1 0.12 0.14 2 4 6 8 milliseconds threads create and join threads MPIT prototype POSIX threads

21 / 26

slide-22
SLIDE 22

Introduction Programming Model The MPI Threads API Summary & Conclusions The MPIT Interface Definition Constructs and Features Performance Overhead

Lock / Unlock Mutex

Difference: additional checks. 0.05 0.1 0.15 0.2 0.25 0.3 2 4 6 8 milliseconds threads lock and unlock mutex MPIT prototype POSIX threads

22 / 26

slide-23
SLIDE 23

Introduction Programming Model The MPI Threads API Summary & Conclusions The MPIT Interface Definition Constructs and Features Performance Overhead

Wait / Wake on Condition Variable

Difference: indirection & checks (different time scale). 1 2 3 4 5 6 2 4 6 8 milliseconds threads wait / wake on condition variable MPIT prototype POSIX threads

23 / 26

slide-24
SLIDE 24

Introduction Programming Model The MPI Threads API Summary & Conclusions

Summary & Conclusions

24 / 26

slide-25
SLIDE 25

Introduction Programming Model The MPI Threads API Summary & Conclusions

Summary & Conclusions

Parallel programs may require additional concurrency, New abstraction: thread collectives, MPIT interface specification, Performance penalty is acceptable... ...unless too many locks are used.

25 / 26

slide-26
SLIDE 26

Introduction Programming Model The MPI Threads API Summary & Conclusions

Thank you!

Thank you!

26 / 26