Concurrent and parallel programming Seminars in Advanced Topics in - - PowerPoint PPT Presentation

concurrent and parallel programming
SMART_READER_LITE
LIVE PREVIEW

Concurrent and parallel programming Seminars in Advanced Topics in - - PowerPoint PPT Presentation

Concurrent and parallel programming Seminars in Advanced Topics in Computer Science Engineering 2019/2020 Romolo Marotta Trend in processor technology Concurrent and parallel programming 2 Blocking synchronization SHARED RESOURCE


slide-1
SLIDE 1

Seminars in Advanced Topics in Computer Science Engineering 2019/2020

Romolo Marotta

Concurrent and parallel programming

slide-2
SLIDE 2

Trend in processor technology

Concurrent and parallel programming 2

slide-3
SLIDE 3

Blocking synchronization

Concurrent and parallel programming 3

SHARED RESOURCE

slide-4
SLIDE 4

Blocking synchronization

Concurrent and parallel programming 4

…zZz… SHARED RESOURCE

slide-5
SLIDE 5

Liveness might be impaired due to the arbitration of accesses

Blocking synchronization

Concurrent and parallel programming 5

…zZz… SHARED RESOURCE Correctness guaranteed by mutual exclusion Performance might be hampered because

  • f waste of clock cycles
slide-6
SLIDE 6

Parallel programming

  • Ad-hoc concurrent programming languages
  • Development tools
  • Compilers
  • MPI, OpenMP, libraries
  • Tools to debug parallel code (gdb, valgrind)
  • Writing parallel code is an art
  • There are approaches, not prepackaged solutions
  • Every machine has its own singularities
  • Every problem to face has different requisites
  • The most efficient parallel algorithm might not be the most

intuitive one

Concurrent and parallel programming 6

slide-7
SLIDE 7

What do we want from parallel programs?

  • Safety: nothing wrong happens (Correctness)
  • parallel versions of our programs should be correct as

their sequential implementations

  • Liveliness: something good happens eventually

(Progress)

  • if a sequential program terminates with a given input,

we want that its parallel alternative also completes with the same input

  • Performance
  • we want to exploit our parallel hardware

Concurrent and parallel programming 7

slide-8
SLIDE 8

Concurrent and parallel programming 8

Correctness conditions Progress conditions Performance

slide-9
SLIDE 9

Correctness

  • What does it mean for a program to be correct?
  • What’s exactly a concurrent FIFO queue?
  • FIFO implies a strict temporal ordering
  • Concurrency implies an ambiguous temporal ordering
  • Intuitively, if we rely on locks, changes happen in a non-

interleaved fashion, resembling a sequential execution

  • We can say a concurrent execution is correct only because

we can associate it with a sequential one, which we know the functioning of

  • An execution is correct if it is equivalent to a correct

sequential execution

Concurrent and parallel programming 9

slide-10
SLIDE 10

Correctness

  • An is correct if it is equivalent to a correct

Concurrent and parallel programming 10

sequential execution execution

slide-11
SLIDE 11

A simplified model of a concurrent system

  • A concurrent system is a collection of sequential

threads/processes that communicate through shared data structures called objects.

  • An object has a unique name and a set of primitive
  • perations.

Concurrent and parallel programming 11

slide-12
SLIDE 12

A simplified model of a concurrent execution

  • A history is a sequence of invocations and replies

generated on an object by a set of threads

  • Invocation:

Concurrent and parallel programming 12

A op(args*) x

thread id

  • bject instance

method name list of parameters

  • Reply:

A ret(res*) x

list of returned values reply token

slide-13
SLIDE 13

A simplified model of a concurrent execution

  • A sequential history is a history where all the invocations

have an immediate response

  • A concurrent history is a history that is not sequential

Concurrent and parallel programming 13

Sequential H’: A op() x A ret() x B op() x B ret() x A op() y A ret() y Concurrent H: A op() x B op() x A ret() x A op() y B ret() x A ret() y

slide-14
SLIDE 14

Correctness

Concurrent and parallel programming 14

 A history is correct if it is to a correct sequential history equivalent

  • An is correct if it is equivalent to a correct

sequential execution execution

slide-15
SLIDE 15

A simplified model of a concurrent execution

  • A process subhistory H|P of a history H is the subsequence
  • f all events in H whose process names are P

Concurrent and parallel programming 15

slide-16
SLIDE 16

A simplified model of a concurrent execution

  • A process subhistory H|P of a history H is the subsequence
  • f all events in H whose process names are P

Concurrent and parallel programming 16

H: A op() x B op() x A ret() x A op() y B ret() x A ret() y

slide-17
SLIDE 17

A simplified model of a concurrent execution

  • A process subhistory H|P of a history H is the subsequence
  • f all events in H whose process names are P

Concurrent and parallel programming 17

H: A op() x A ret() x A op() y A ret() y

slide-18
SLIDE 18

A simplified model of a concurrent execution

  • A process subhistory H|P of a history H is the subsequence
  • f all events in H whose process names are P

Concurrent and parallel programming 18

H|A: A op() x A ret() x A op() y A ret() y H: A op() x A ret() x A op() y A ret() y

slide-19
SLIDE 19

A simplified model of a concurrent execution

  • A process subhistory H|P of a history H is the subsequence
  • f all events in H whose process names are P

Concurrent and parallel programming 19

H|A: A op() x A ret() x A op() y A ret() y

  • Process subhistories are always sequential

H: A op() x A ret() x A op() y A ret() y

slide-20
SLIDE 20

Equivalence between histories

  • Two histories H and H’ are equivalent if for every process P,

H|P=H’|P

Concurrent and parallel programming 20

H: A op() x B op() x A ret() x A op() y B ret() x A ret() y H’: B op() x B ret() x A op() x A ret() x A op() y A ret() y

slide-21
SLIDE 21

Equivalence between histories

  • Two histories H and H’ are equivalent if for every process P,

H|P=H’|P

Concurrent and parallel programming 21

H: A op() x A ret() x A op() y A ret() y H’: A op() x A ret() x A op() y A ret() y

slide-22
SLIDE 22

Equivalence between histories

  • Two histories H and H’ are equivalent if for every process P,

H|P=H’|P

Concurrent and parallel programming 22

H|A: H’|A: A op() x A ret() x A op() y A ret() y H: A op() x A ret() x A op() y A ret() y H’: A op() x A ret() x A op() y A ret() y

slide-23
SLIDE 23

Equivalence between histories

  • Two histories H and H’ are equivalent if for every process P,

H|P=H’|P

Concurrent and parallel programming 23

H: A op() x B op() x A ret() x A op() y B ret() x A ret() y H|A: H’|A: A op() x A ret() x A op() y A ret() y H’: B op() x B ret() x A op() x A ret() x A op() y A ret() y

slide-24
SLIDE 24

Equivalence between histories

  • Two histories H and H’ are equivalent if for every process P,

H|P=H’|P

Concurrent and parallel programming 24

H|A: H’|A: A op() x A ret() x A op() y A ret() y H: B op() x B ret() x H’: B op() x B ret() x

slide-25
SLIDE 25

Equivalence between histories

  • Two histories H and H’ are equivalent if for every process P,

H|P=H’|P

Concurrent and parallel programming 25

H|A: H’|A: A op() x A ret() x A op() y A ret() y H|B: H’|B: B op() x B ret() x H: B op() x B ret() x H’: B op() x B ret() x

slide-26
SLIDE 26

Equivalence between histories

  • Two histories H and H’ are equivalent if for every process P,

H|P=H’|P

Concurrent and parallel programming 26

H: A op() x B op() x A ret() x A op() y B ret() x A ret() y H|A: H’|A: A op() x A ret() x A op() y A ret() y H’: B op() x B ret() x A op() x A ret() x A op() y A ret() y H|B: H’|B: B op() x B ret() x H: B op() x B ret() x H’: B op() x B ret() x

slide-27
SLIDE 27

Correctness conditions

  • A is correct if it is equivalent to a

correct

Concurrent and parallel programming 27

sequential execution concurrent execution  A history is correct if it is to a correct sequential history equivalent which satisfies a given correctness condition

  • A correctness condition specifies the set of histories to be

considered as reference In order to implement correctly a concurrent object wrt a correctness condition, we must guarantee that every possible history on our implementation satisfies the correctness condition

slide-28
SLIDE 28

Sequential Consistency [Lamport 1970]

  • A history H is sequentially consistent if
  • 1. it is equivalent to a sequential history S
  • 2. S is legal according to the sequential definition of the
  • bject

 An object implementation is sequentially consistent if every history associated with its usage is sequentially consistent

Concurrent and parallel programming 28

slide-29
SLIDE 29

Sequential Consistency [Lamport 1970]

Concurrent and parallel programming 29

Enq(1)

B A

Enq(2) Deq(2)

A Enq(1) x A ret() x B Enq(2) x B ret() x B Deq(2) x B ret() x

slide-30
SLIDE 30

Sequential Consistency [Lamport 1970]

Concurrent and parallel programming 30

A Enq(1) x A ret() x B Enq(2) x B ret() x B Deq(2) x B ret() x H:

slide-31
SLIDE 31

Sequential Consistency [Lamport 1970]

Concurrent and parallel programming 31

A Enq(1) x A ret() x B Enq(2) x B ret() x B Deq(2) x B ret() x H: A Enq(1) x A ret() x H|A: B Enq(2) x B ret() x B Deq(2) x B ret() x H|B:

slide-32
SLIDE 32

Sequential Consistency [Lamport 1970]

Concurrent and parallel programming 32

A Enq(1) x A ret() x B Enq(2) x B ret() x B Deq(2) x B ret() x H: A Enq(1) x A ret() x H|A: B Enq(2) x B ret() x B Deq(2) x B ret() x H|B: B Enq(2) x B ret() x A Enq(1) x A ret() x B Deq(2) x B ret() x H’:

slide-33
SLIDE 33

Sequential Consistency [Lamport 1970]

Concurrent and parallel programming 33

A Enq(1) x A ret() x B Enq(2) x B ret() x B Deq(2) x B ret() x H: B Enq(2) x B ret() x A Enq(1) x A ret() x B Deq(2) x B ret() x H’:

  • H’ is legal and sequential
  • H is equivalent to H’
  • H is correct w.r.t sequential consistency
slide-34
SLIDE 34

Linearizability [Herlihy 1990]

  • A concurrent execution is linearizable if:
  • Each procedure appears to be executed in an indivisible point

(linearization point) between its invocation and completion

  • The order among those points is correct according to the

sequential definition of objects

Concurrent and parallel programming 34

slide-35
SLIDE 35

Linearizability [Herlihy 1990]

Concurrent and parallel programming 35

Enq(1)

B A

Enq(2) Deq(2)

slide-36
SLIDE 36

Linearizability [Herlihy 1990]

Concurrent and parallel programming 36

Enq(1)

B A

Enq(2) Deq(2)

slide-37
SLIDE 37

Linearizability [Herlihy 1990]

Concurrent and parallel programming 37

Enq(1)

B A

Enq(2) Deq(2)

slide-38
SLIDE 38

Linearizability [Herlihy 1990]

Concurrent and parallel programming 38

Enq(1)

B A

Enq(2) Deq(2)

slide-39
SLIDE 39

Linearizability [Herlihy 1990]

Concurrent and parallel programming 39

Enq(1)

B A

Enq(2) Deq(2)

slide-40
SLIDE 40

Linearizability [Herlihy 1990]

Concurrent and parallel programming 40

Enq(1)

B A

Enq(2) Deq(2)

slide-41
SLIDE 41

Linearizability [Herlihy 1990]

Concurrent and parallel programming 41

Enq(1)

B A

Enq(2) Deq(2)

slide-42
SLIDE 42

Linearizability [Herlihy 1990]

Concurrent and parallel programming 42

Enq(1)

B A

Enq(2) Deq(2)

slide-43
SLIDE 43

Linearizability [Herlihy 1990]

  • A history H is linearizable if:
  • 1. it is equivalent to sequential history S
  • 2. S is correct according to the sequential definition of
  • bjects
  • 3. If a response precedes an invocation in the original

history, then it must precede it in the sequential one as well  An object implementation is linearizable if every history associated with its usage can be linearized

Concurrent and parallel programming 43

slide-44
SLIDE 44

Linearizability [Herlihy 1990]

  • Linearizability requires:
  • Sequential Consistency
  • Real-time order
  • Linearizability ⇒ Sequential Consistency
  • The composition of linearizable histories is still linearizable
  • Linearizability is a local property (closed under

composition)

Concurrent and parallel programming 44

slide-45
SLIDE 45

Quick look on transaction correctness conditions

  • We can see a transaction as a set of procedures on

different object that has to appear as atomic

  • Serializability requires that transactions appear to execute

sequentially, i.e., without interleaving.

  • A sort of sequential consistency for multi-object atomic

procedures

  • Strict-Serializability requires the transactions’ order in the

sequential history is compatible with their precedence

  • rder
  • A sort of linearizability for multi-object atomic procedures

Concurrent and parallel programming 45

slide-46
SLIDE 46

Strict Serializability

A bird’s eye view on correctness conditions

Concurrent and parallel programming 46

Serializability Sequential Consistency Linearizability

slide-47
SLIDE 47

Strict Serializability

A bird’s eye view on correctness conditions

Concurrent and parallel programming 47

Serializability Sequential Consistency Linearizability Opacity

These predicate only on committed transactions It restricts also aborted transactions (required for Transactional Memory)

slide-48
SLIDE 48

Correctness conditions (incomplete) taxonomy

Sequential Consistency Linearizability Serializability Strict Serializability Equivalent to a sequential order Respects program order in each thread Consistent with real-time ordering Access multiple objects atomically Locality

Concurrent and parallel programming 48

slide-49
SLIDE 49

Concurrent and parallel programming 49

Correctness conditions Progress conditions Performance

slide-50
SLIDE 50

Progress conditions

  • Deadlock-freedom:
  • Some thread acquires a lock eventually
  • Starvation-freedom:
  • Every thread acquires a lock eventually

Concurrent and parallel programming 50

slide-51
SLIDE 51

Blocking synchronization

Concurrent and parallel programming 51

…zZz… SHARED RESOURCE The scheduler should guarantee that the thread holding the lock completes its critical section

slide-52
SLIDE 52

Scheduler’s role

Progress conditions on multiprocessors

  • Are not only about guarantees provided by a method

implementation

  • Are also about the scheduling support needed to provide

progress Requirement for lock-based applications

  • Fair histories

Every thread takes an infinite number of concrete steps

Concurrent and parallel programming 52

slide-53
SLIDE 53

Progress conditions

  • Deadlock-freedom:
  • Some thread acquires a lock eventually
  • Starvation-freedom:
  • Every thread acquires a lock eventually

Concurrent and parallel programming 53

slide-54
SLIDE 54

Progress conditions

  • Deadlock-freedom:
  • Some thread acquires a lock eventually
  • Some method call completes in every fair execution
  • Starvation-freedom:
  • Every thread acquires a lock eventually
  • Every method call completes in every fair execution
  • Lock-freedom:
  • Some method call completes in every execution
  • Wait-freedom:
  • Every method call completes in every execution
  • Obstruction-freedom:
  • Every method call, which executes in isolation, completes

Concurrent and parallel programming 54

slide-55
SLIDE 55

Progress taxonomy

Non-blocking Blocking For everyone Wait freedom Obstruction freedom Starvation freedom For someone Lock freedom Deadlock freedom

Concurrent and parallel programming 55

slide-56
SLIDE 56

Independent Dependent Non-blocking Blocking For everyone

  • Thread executes in

isolation Fairness For someone

  • Fairness

Progress taxonomy

Concurrent and parallel programming 56

slide-57
SLIDE 57

Independent Dependent Non-blocking Blocking For everyone Wait freedom Obstruction freedom Starvation freedom For someone Lock freedom Clash freedom Deadlock freedom

Progress taxonomy

Concurrent and parallel programming 57

slide-58
SLIDE 58

Independent Dependent Non-blocking Blocking For everyone Wait freedom Obstruction freedom Starvation freedom For someone Lock freedom Clash freedom Deadlock freedom

Progress taxonomy

Concurrent and parallel programming 58

  • The Einsteinium of progress conditions: it does not exist in nature

and (maybe) has no “commercial” value

  • Clash freedom is a strictly weaker property than obstruction freedom
slide-59
SLIDE 59

Concurrent and parallel programming 59

Correctness conditions Progress conditions Performance

slide-60
SLIDE 60

The cost of synchronization

Concurrent and parallel programming 60

…zZz… SHARED RESOURCE

slide-61
SLIDE 61

The cost of synchronization

Concurrent and parallel programming 61

1x 2x 4x 1.5x 1.8x

slide-62
SLIDE 62

Amdahl Law – Fixed-size Model (1967)

Concurrent and parallel programming 62

slide-63
SLIDE 63

Amdahl Law – Fixed-size Model (1967)

  • The workload is fixed: it studies how the behavior of the

same program varies when adding more computing power 𝑇𝐵𝑛𝑒𝑏ℎ𝑚 = 𝑈

𝑡

𝑈

𝑞

= 𝑈

𝑡

𝛽𝑈

𝑡 + (1 − 𝛽) 𝑈 𝑡

𝑞 = 1 𝛽 + (1 − 𝛽) 𝑞

  • where:
  • 𝛽 ∈ [0,1]: Serial fraction of the program
  • 𝑞 ∈ 𝑂: Number of processors
  • 𝑈

𝑡 : Serial execution time

  • 𝑈

𝑞 : Parallel execution time

  • It can be expressed as well vs. the parallel fraction

𝑄 = 1 − 𝛽

Concurrent and parallel programming 63

slide-64
SLIDE 64

Amdahl Law – Fixed-size Model (1967)

Concurrent and parallel programming 64

slide-65
SLIDE 65

How real is this?

lim

𝑞→∞ 𝑇𝐵𝑛𝑒𝑏ℎ𝑚 = lim 𝑞→∞

1 𝛽 + (1 − 𝛽) 𝑞 = 1 𝛽

  • So if the sequential fraction is 20%, we have:

lim

𝑞→∞ 𝑇𝐵𝑛𝑒𝑏ℎ𝑚 = 1 0.2 =5

  • Speedup 5 using infinite processors!

Concurrent and parallel programming 65

slide-66
SLIDE 66

Fixed-time model

Concurrent and parallel programming 66

slide-67
SLIDE 67

Gustafson Law—Fixed-time Model (1989)

  • The execution time is fixed: it studies how the behavior of

the scaled program varies when adding more computing power 𝑋′ = 𝛽𝑋 + 1 − 𝛽 𝑞𝑋 𝑇𝐻𝑣𝑡𝑢𝑏𝑔𝑡𝑝𝑜 = 𝑋′ 𝑋 = 𝛽 + 1 − 𝛽 𝑞

  • where:
  • 𝛽 ∈ [0,1]: Serial fraction of the program
  • 𝑞 ∈ 𝑂: Number of processors
  • 𝑋 : Original workload
  • 𝑋′ : Scaled workload

Concurrent and parallel programming 67

slide-68
SLIDE 68

Speed-up according to Gustafson

Concurrent and parallel programming 68

slide-69
SLIDE 69

Memory-bounded model

Concurrent and parallel programming 69

slide-70
SLIDE 70

Sun Ni Law—Memory-bounded Model (1993)

  • The workload is scaled, bounded by memory

𝑇𝑇𝑣𝑜−𝑂𝑗 = 𝑡𝑓𝑟𝑣𝑓𝑜𝑢𝑗𝑏𝑚 𝑢𝑗𝑛𝑓 𝑔𝑝𝑠 𝑋∗ 𝑞𝑏𝑠𝑏𝑚𝑚𝑓𝑚 𝑢𝑗𝑛𝑓 𝑔𝑝𝑠 𝑋∗ 𝑇𝑇𝑣𝑜−𝑂𝑗 = 𝛽𝑋 + 1 − 𝛽 𝐻 𝑞 𝑋 𝛽𝑋 + 1 − 𝛽 𝐻 𝑞 𝑋 𝑞 = 𝛽 + 1 − 𝛽 𝐻 𝑞 𝛽 + 1 − 𝛽 𝐻 𝑞 𝑞

  • where:
  • 𝐻(𝑞) describes the workload increase as the memory capacity

increases

  • 𝑋∗ = 𝛽𝑋 + 1 − 𝛽 𝐻 𝑞 𝑋

Concurrent and parallel programming 70

slide-71
SLIDE 71

Speed-up according to Sun Ni

𝑇𝑇𝑣𝑜−𝑂𝑗 = 𝛽 + 1 − 𝛽 𝐻 𝑞 𝛽 + 1 − 𝛽 𝐻 𝑞 𝑞

  • If 𝐻 𝑞 = 1

𝑇𝐵𝑛𝑒𝑏ℎ𝑚 = 1 𝛽 + 1 − 𝛽 𝑞

  • If 𝐻 𝑞 = 𝑞

𝑇𝐻𝑣𝑡𝑢𝑏𝑔𝑡𝑝𝑜 = 𝛽 + 1 − 𝛽 𝑞

  • In general 𝐻 𝑞 > 𝑞 gives a higher scale-up

Concurrent and parallel programming 71

slide-72
SLIDE 72

Superlinear speedup

  • Can we have a Speed-up > p ?
  • Workload increases more than computing power (G(p) > p)
  • Cache effect: larger accumulated cache size. More or even all of

the working set can fit into caches and the memory access time reduces dramatically

  • RAM effect: enables the dataset to move from disk into RAM

drastically reducing the time required, e.g., to search it.

Concurrent and parallel programming 72

Yes!

slide-73
SLIDE 73

Scalability

  • Efficiency

𝐹 = 𝑡𝑞𝑓𝑓𝑒𝑣𝑞 #𝑞𝑠𝑝𝑑𝑓𝑡𝑡𝑝𝑠𝑡

  • Strong Scalability: If the efficiency is kept fixed while the

number of processes and maintain fixed the problem size

  • Weak Scalability: If the efficiency is kept fixed while

increasing at the same rate the problem size and the number of processes

Concurrent and parallel programming 73

slide-74
SLIDE 74

Recommended readings

Concurrent and parallel programming 74

  • Linearizability: A correctness condition for concurrent
  • bjects
  • M. Herlihy et al , ACM Trans. Program. Lang. Syst
  • On the nature of progress
  • M. Herlihy et al, OPODIS’11.
  • Another view on parallel speedup
  • X. Sun et al, IEEE Supercomputing Conference, 1990.