Memory Model COS 597C 10/5/2010 Example a = Flag = 0 Thread a = - - PowerPoint PPT Presentation

memory model
SMART_READER_LITE
LIVE PREVIEW

Memory Model COS 597C 10/5/2010 Example a = Flag = 0 Thread a = - - PowerPoint PPT Presentation

Memory Model COS 597C 10/5/2010 Example a = Flag = 0 Thread a = 26; Flag = 1; 2 Memory Model COS 597C, Fall 2010 Example a = Flag = 0 Thread a = 26; Flag = 1; Compiler Transformation Flag = 1; a =


slide-1
SLIDE 1

Memory Model

COS 597C 10/5/2010

slide-2
SLIDE 2

Example

Thread

Memory Model COS 597C, Fall 2010 2

a = 26; Flag = 1;

a = Flag = 0

slide-3
SLIDE 3

Example

Thread

Memory Model COS 597C, Fall 2010 3

a = 26; Flag = 1; Flag = 1; a = 26;

a = Flag = 0

Compiler Transformation

slide-4
SLIDE 4

Example

Thread 1 Thread 2

Memory Model COS 597C, Fall 2010 4

a = 26; Flag = 1; while (Flag != 1) {}; b = a;

What is the value of b after execution? a = Flag = 0

slide-5
SLIDE 5

Example

Thread 1 Thread 2

Memory Model COS 597C, Fall 2010 5

a = 26; Flag = 1; while (Flag != 1) {}; b = a;

What is the value of b after execution? a = Flag = 0 26 ?

slide-6
SLIDE 6

Example

Thread 1 Thread 2

Memory Model COS 597C, Fall 2010 6

a = 26; Flag = 1; while (Flag != 1) {}; b = a;

What is the value of b after execution? a = Flag = 0 0 !!

slide-7
SLIDE 7

How could this happen?

Memory Model COS 597C, Fall 2010 7

 Compilers can reorder instructions

Thread 1 Thread 2

a = 26; Flag = 1; while (Flag != 1) {}; b = a;

a = Flag = 0

slide-8
SLIDE 8

How could this happen?

Memory Model COS 597C, Fall 2010 8

 Compilers can reorder instructions

Thread 1 Thread 2

Flag = 1; a = 26; while (Flag != 1) {}; b = a;

a = Flag = 0

(1) (2) (3) (4)

slide-9
SLIDE 9

How could this happen?

Memory Model COS 597C, Fall 2010 9

 Lets disable compiler reordering. How about now?

Thread 1 Thread 2

a = 26; Flag = 1; while (Flag != 1) {}; b = a;

a = Flag = 0

slide-10
SLIDE 10

How could this happen?

Memory Model COS 597C, Fall 2010 10

 Lets disable compiler reordering. How about now?

Thread 1 Thread 2

a = 26; Flag = 1; while (Flag != 1) {}; b = a;

a = Flag = 0 0 !!

slide-11
SLIDE 11

How could this happen?

Memory Model COS 597C, Fall 2010 11

 Hardware out-of-order execution

Thread 1 Thread 2

a = 26; Flag = 1; while (Flag != 1) {}; b = a;

a = Flag = 0 0 !!

a = 26; Flag = 1; …… Reorder buffer of P1

slide-12
SLIDE 12

How could this happen?

Memory Model COS 597C, Fall 2010 12

 Hardware out-of-order execution

Thread 1 Thread 2

a = 26; Flag = 1; while (Flag != 1) {}; b = a;

a = Flag = 0 0 !!

Flag = 1; a = 26; …… Reorder buffer of P1

slide-13
SLIDE 13

Things could go crazy ….. If we don’t define what is a valid optimization

Memory Model COS 597C, Fall 2010 13

slide-14
SLIDE 14

What is Memory (Consistency) Model?

 “A formal specification of how the memory system will

appear to the programmer, eliminating the gap between the behavior expected by the programmer and the actual behavior supported by a system.” [Adve’ 1995]

 Memory model specifies:

 How threads interact through memory  What value a read can return  When does a value update become visible to other threads  What assumptions are allowed to make about memory when

writing a program or applying some program optimization

Memory Model COS 597C, Fall 2010 14

slide-15
SLIDE 15

Why do We Care?

 Memory model affects:

 Programmability  Performance  Portability

Memory Model COS 597C, Fall 2010 15

Program Machine Code JIT Hardware

Compiler Memory Model 1 Memory Model 2

slide-16
SLIDE 16

The Single Thread Model

 Memory access executes one-at-a-time in program order  Read returns value of last write  For hardware & compiler reordering

 Optimization must respect data/control dependences  Memory operations must follow the order the program is

written

 Easy to program and optimize

Memory Model COS 597C, Fall 2010 16

slide-17
SLIDE 17

Strict Consistency Model

Memory Model COS 597C, Fall 2010 17

 Any read to memory location X returns the value stored

by the latest write to X

Thread 1 Thread 2 X = 1; …… …… R1 = X; R2 = X; R1 1 R2 1 X 1 Timeline

slide-18
SLIDE 18

Strict Consistency Model

Memory Model COS 597C, Fall 2010 18

 Any read to memory location X returns the value stored

by the latest write to X

Thread 1 Thread 2 X = 1; …… R1 = X; …… R2 = X; R1 R2 1 X 1 Timeline

slide-19
SLIDE 19

Strict Consistency Model

Memory Model COS 597C, Fall 2010 19

 Any read to memory location X returns the value stored

by the latest write to X

Thread 1 Thread 2 X = 1; …… …… R1 = X; R2 = X; R1 R2 1 X 1 Timeline

slide-20
SLIDE 20

Sequential Consistency

 Definition: [Lamport’ 1979]

the result of any execution is the same as:

 The operations of each thread appears in program order  Operations of all threads were executed in some sequential

  • rder atomically

 Atomicity

 Isolation : no one sees partial memory update  Serialization : memory access appear to occur at the same time

for everyone

Memory Model COS 597C, Fall 2010 20

slide-21
SLIDE 21

Under Sequential Consistency Model

Memory Model COS 597C, Fall 2010 21

 The operations of each thread appears in program order  Operations of all threads were executed in some sequential

  • rder atomically

Thread 1 Thread 2 X = 1; …… …… R1 = X; R2 = X; R1 R2 1 X 1 Timeline

slide-22
SLIDE 22

Under Sequential Consistency Model

Memory Model COS 597C, Fall 2010 22

 The operations of each thread appears in program order  Operations of all threads were executed in some sequential

  • rder atomically

Thread 1 Thread 2 X = 1; …… …… R1 = X; R2 = X; R1 1 R2 X 1 Timeline

slide-23
SLIDE 23

Example

Memory Model COS 597C, Fall 2010 23

 Dekker’s algorithm for critical sections

Thread 1 Thread 2 Flag1 = 1; if (Flag2 == 0) critical Flag2 = 1; if (Flag1 == 0) critical Flag1 = Flag2 = 0;

slide-24
SLIDE 24

Example

Memory Model COS 597C, Fall 2010 24

 Dekker’s algorithm for critical sections

Thread 1 Thread 2 Flag1 = 1; if (Flag2 == 0) critical Flag2 = 1; if (Flag1 == 0) critical Flag1 = Flag2 = 0; Flags1 1 Flags2

slide-25
SLIDE 25

Example

Memory Model COS 597C, Fall 2010 25

 Dekker’s algorithm for critical sections

Thread 1 Thread 2 Flag1 = 1; if (Flag2 == 0) critical Flag2 = 1; if (Flag1 == 0) critical Flag1 = Flag2 = 0; Flags1 1 Flags2 1

slide-26
SLIDE 26

Example

Memory Model COS 597C, Fall 2010 26

 Dekker’s algorithm for critical sections

Thread 1 Thread 2 Flag1 = 1; if (Flag2 == 0) critical Flag2 = 1; if (Flag1 == 0) critical Flag1 = Flag2 = 0; Flags1 Flags2 1

Violation !!!

slide-27
SLIDE 27

How do we violate sequential consistency?

Memory Model COS 597C, Fall 2010 27

Very EASY ! Lets take a look at several hardware/ compiler optimizations that are commonly used for uniprocessor

slide-28
SLIDE 28

Violation of SC: Architecture without Caches

Memory Model COS 597C, Fall 2010 28

 Write buffers with bypassing

Thread 1 Thread 2

Flag1 = 1; if (Flag2 ==0) critical Flag2 = 1; if (Flag1 ==0) critical

Shared Bus T1 T2

Buffer Buffer

Flag1 Flag2

slide-29
SLIDE 29

Violation of SC: Architecture without Caches

Memory Model COS 597C, Fall 2010 29

 Write buffers with bypassing

Thread 1 Thread 2

Flag1 = 1; if (Flag2 ==0) critical Flag2 = 1; if (Flag1 ==0) critical

Shared Bus T1 T2 Flag1 Flag2

(1)
 Read 
 Flag2 = 0

slide-30
SLIDE 30

Violation of SC: Architecture without Caches

Memory Model COS 597C, Fall 2010 30

 Write buffers with bypassing

Thread 1 Thread 2

Flag1 = 1; if (Flag2 ==0) critical Flag2 = 1; if (Flag1 ==0) critical

Shared Bus T1 T2 Flag1 Flag2

(1)
 Read 
 Flag2 = 0 (2)
 Read 
 Flag1 = 0

slide-31
SLIDE 31

Violation of SC: Architecture without Caches

Memory Model COS 597C, Fall 2010 31

 Write buffers with bypassing

Thread 1 Thread 2

Flag1 = 1; if (Flag2 ==0) critical Flag2 = 1; if (Flag1 ==0) critical

Shared Bus T1 T2

(3)

Write

Flag1

Flag1 Flag2

(1)
 Read 
 Flag2 = 0 (2)
 Read 
 Flag1 = 0

slide-32
SLIDE 32

Violation of SC: Architecture without Caches

Memory Model COS 597C, Fall 2010 32

 Write buffers with bypassing

Thread 1 Thread 2

Flag1 = 1; if (Flag2 ==0) critical Flag2 = 1; if (Flag1 ==0) critical

Shared Bus T1 T2

(3) Write Flag1 (4) Write Flag2

Flag1 Flag2

(1)
 Read 
 Flag2 = 0 (2)
 Read 
 Flag1 = 0

slide-33
SLIDE 33

Violation of SC: Architecture without Caches

Memory Model COS 597C, Fall 2010 33

 Overlapping writes

T1 T2

(1) write 
 Flag Flag = a = 0;

Memory

Flag = 0 a = 0

Thread 1 Thread 2

a = 26; Flag= 1; while (Flag == 0) {}; b = a;

slide-34
SLIDE 34

Violation of SC: Architecture without Caches

Memory Model COS 597C, Fall 2010 34

 Overlapping writes

T1 T2

(1) write 
 Flag

Memory

Flag = 0 a = 0 (2) read Flag Flag = a = 0;

Thread 1 Thread 2

a = 26; Flag= 1; while (Flag == 0) {}; b = a;

slide-35
SLIDE 35

Violation of SC: Architecture without Caches

Memory Model COS 597C, Fall 2010 35

 Overlapping writes

T1 T2

(1) write 
 Flag

Memory

Flag = 0 a = 0 (2) read Flag (3) read a Flag = a = 0;

Thread 1 Thread 2

a = 26; Flag= 1; while (Flag == 0) {}; b = a;

slide-36
SLIDE 36

Violation of SC: Architecture without Caches

Memory Model COS 597C, Fall 2010 36

 Overlapping writes

T1 T2

(1) write 
 Flag

Memory

Flag = 0 a = 0 (4) write 
 a (2) read Flag (3) read a Flag = a = 0;

Thread 1 Thread 2

a = 26; Flag= 1; while (Flag == 0) {}; b = a;

slide-37
SLIDE 37

Violation of SC: Architecture without Caches

Memory Model COS 597C, Fall 2010 37

 Non-blocking reads

T1 T2 Memory

Flag = 0 a = 0 (1) read a Flag = a = 0;

Thread 1 Thread 2

a = 26; Flag= 1; while (Flag == 0) {}; b = a;

slide-38
SLIDE 38

Violation of SC: Architecture without Caches

Memory Model COS 597C, Fall 2010 38

 Non-blocking reads

T1 T2 Memory

Flag = 0 a = 0 (2) write 
 a (1) read a Flag = a = 0;

Thread 1 Thread 2

a = 26; Flag= 1; while (Flag == 0) {}; b = a;

slide-39
SLIDE 39

Violation of SC: Architecture without Caches

Memory Model COS 597C, Fall 2010 39

 Non-blocking reads

T1 T2

(3) write 
 Flag

Memory

Flag = 0 a = 0 (2) write 
 a (1) read a Flag = a = 0;

Thread 1 Thread 2

a = 26; Flag= 1; while (Flag == 0) {}; b = a;

slide-40
SLIDE 40

Violation of SC: Architecture without Caches

Memory Model COS 597C, Fall 2010 40

 Non-blocking reads

T1 T2

(3) write 
 Flag

Memory

Flag = 0 A = 0 (2) write 
 a (1) read a (4) read Flag Flag = a = 0;

Thread 1 Thread 2

a = 26; Flag= 1; while (Flag == 0) {}; b = a;

slide-41
SLIDE 41

Architecture with Private Caches

Memory Model COS 597C, Fall 2010 41

To comply with Sequential Consistency, we need:

 Cache coherency protocol

 A write is eventually made visible to all processors  Writes to the same location appear to be seen in the same

  • rder by all processors (serialization) [Gharachorloo’90]

 Ability to detect the completion of write operations

 Acknowledgement messages  Invalid or update messages

 The illusion of atomic writes

slide-42
SLIDE 42

Write atomicity

Memory Model COS 597C, Fall 2010 42

Thread 1 Thread 2 Thread 3 Thread 4 A = 1; B = 1; A = 2; C = 1; while (B!=1) {}; while (C!=1) {}; R1 = A; while (B!=1) {}; while (C!=1) {}; R2 = A;

What is the value of R1 and R2 after execution?

A = B = C = 0;

slide-43
SLIDE 43

Write Atomicity

Memory Model COS 597C, Fall 2010 43

Thread 1 Thread 2 Thread 3 Thread 4 A = 1; B = 1; A = 2; C = 1; while (B!=1) {}; while (C!=1) {}; R1 = A; while (B!=1) {}; while (C!=1) {}; R2 = A; A = B = C = 0; R1 = 1 R2 = 1

slide-44
SLIDE 44

Write Atomicity

Memory Model COS 597C, Fall 2010 44

Thread 1 Thread 2 Thread 3 Thread 4 A = 1; B = 1; A = 2; C = 1; while (B!=1) {}; while (C!=1) {}; R1 = A; while (B!=1) {}; while (C!=1) {}; R2 = A; A = B = C = 0; R1 = 2 R2 = 2

slide-45
SLIDE 45

Write Atomicity

Memory Model COS 597C, Fall 2010 45

Thread 1 Thread 2 Thread 3 Thread 4 A = 1; B = 1; A = 2; C = 1; while (B!=1) {}; while (C!=1) {}; R1 = A; while (B!=1) {}; while (C!=1) {}; R2 = A; A = B = C = 0; R1 = 1 R2 = 2

Sequential Consistency:

  • peration from all threads must appear

In some sequential order atomically Violation !!!

slide-46
SLIDE 46

Compilers Optimization that Violates SC

Memory Model COS 597C, Fall 2010 46

 Compiler reordering must respect data and control

dependencies

 Code motion

Thread 1 Thread 2 for(i=0;i<10;i++) *a = i; while (true) b = *a;

Load from a cannot be moved out of the loop

slide-47
SLIDE 47

Compilers Optimization that Violates SC

Memory Model COS 597C, Fall 2010 47

 Compiler reordering must respect data and control

dependencies

 Code motion  Common sub-expression elimination

Thread 1 Thread 2 a = 6; Flag = 1; c = a – 1; while (Flag == 0) {}; b = a - 1;

(a-1) cannot be 
 eliminated for 
 assignment of b

slide-48
SLIDE 48

Compilers Optimization that Violates SC

Memory Model COS 597C, Fall 2010 48

 Compiler reordering must respect data and control

dependencies

 Code motion  Common sub-expression elimination  Register allocation

Thread 1 Thread 2 a = 6; Flag = 1; while (Flag == 0) {}; b = a;

Flag cannot be allocated to a register

slide-49
SLIDE 49

Sequential Consistency: Summary

Memory Model COS 597C, Fall 2010 49

 Sequential consistency does not guarantee data race free  Possible hardware/compiler optimizations allowed

 Hardware/software prefetching  Speculating read values

 Determining which instructions are allowed to be

reordered remain an open question

Thread 1 Thread 2 A = 1; B = 1; A = 2; C = 1; Data race:

  • two memory access to the same location
  • one is a write
  • they can occur simultaneously
slide-50
SLIDE 50

Relaxed Memory Models

Memory Model COS 597C, Fall 2010 50

 Key points

 Program order for different memory addresses  Write atomicity

 Possible relaxations

 Relaxation on program order ( different memory locations )

 Relax write to read program order  Relax write to write program order  Relax read to read and read to write program order

 Relaxation on write atomicity

 Read other’s write early  Read own write early

 Safety nets, such as fence …… code fence code …… ✗

slide-51
SLIDE 51

Major Relaxed Hardware Models

Memory Model COS 597C, Fall 2010 51

Relax W->R W->W R->RW

Read others’ write early

Read own write early Safety Net

SC ✔ IBM 370 ✔

Serial inst

TSO(x86) ✔ ✔

RMW, fence

PC ✔ ✔ ✔

RMW

PSO ✔ ✔ ✔

RMW

WO ✔ ✔ ✔ ✔

synch

RCsc ✔ ✔ ✔ ✔

lock, nsync, RMW

RCpc ✔ ✔ ✔ ✔ ✔ Alpha ✔ ✔ ✔ ✔

MB, WMB

RMO ✔ ✔ ✔ ✔

MEMBAR

PowerPC ✔ ✔ ✔ ✔ ✔

synch

slide-52
SLIDE 52

Processor Consistency

Memory Model COS 597C, Fall 2010 52

 Writes done by a single processor are received by other

processors in the same order as they are issued.

 Writes from different processors may be seen in different

  • rder by different processors.

Thread 1 Thread 2 Thread 3 Thread 4 A = 1; B = 1; A = 2; C = 1; while (B!=1) {}; while (C!=1) {}; R1 = A; while (B!=1) {}; while (C!=1) {}; R2 = A; A = B = C = 0; R1 = 1 R2 = 2 ✔

slide-53
SLIDE 53

Weak Ordering Model [Dubois’ 86]

Memory Model COS 597C, Fall 2010 53

 Classification of memory operations

 Data operations: load, store…  Synchronization operations: lock unlock etc

 How does it work ?

 All pre-issued operations must complete on all processors

before executing a synchronization operation

 Execution of synchronization operations must follow program

  • rder

 Memory operations between synchronization operations can

be reordered

slide-54
SLIDE 54

Data-Race-Free-0 Model

Memory Model COS 597C, Fall 2010 54

 A program is data-race-free on a particular input if no

sequential consistent execution results in a data race

 A new definition of weak ordering [Adve’90 ISCA]  Advantage:

 Simple programmability of sequential consistency  Implementation flexibility of relaxed models

 Sequential consistency for DRF is widely used

 C++ memory model

slide-55
SLIDE 55

Relaxed Memory Model: Summary

Memory Model COS 597C, Fall 2010 55

 Relaxed memory model

 Relaxes restrains on the order of some memory operations  Allows some hardware/compiler optimization

 Why do we use relaxed memory model : performance  Why do we not use relaxed memory model : complexity

slide-56
SLIDE 56

CASE STUDY: The C++ Memory Model

 Adaption of DRF0

 Sequential consistency for race-free programs  Behavior of a program with data race is undefined (no benign

data races in C++)

 Data operations:

load, store

 Synchronization operations:

lock, unlock, atomic load, atomic store, atomic read-modify-write

 Atomic operations must appear sequentially consistent [Boehm’s PLDI 2008: Foundations of the C++ Concurrency Memory Model]

Memory Model COS 597C, Fall 2010 56

slide-57
SLIDE 57

CASE STUDY: The C++ Memory Model

Memory Model COS 597C, Fall 2010 57

 Compiler code reordering allowed when:

For memory operations M1 and M2

 M1 is a data operation and M2 is a read synchronization

  • peration

 M1 is write synchronization and M2 is data  M1 and M2 are both data with no synchronization sequence-

  • rdered between them.

 M1 is data and M2 is the write of a lock operation  M1 is unlock and M2 is either a read or write of a lock.

 Hardware optimization allowed for non-atomic writes

slide-58
SLIDE 58

CASE STUDY: The C++ Memory Model

Memory Model COS 597C, Fall 2010 58

 Semantic of trylock

Thread 1 Thread 2 X = 42; lock(l); while (trylock(l) == success) unlock(l); assert( X==42 );

Can the program assert?

slide-59
SLIDE 59

CASE STUDY: The C++ Memory Model

Memory Model COS 597C, Fall 2010 59

 Semantic of trylock

Thread 1 Thread 2 X = 42; lock(l); while (trylock(l) == success) unlock(l); assert( X==42 );

Can the program assert?

slide-60
SLIDE 60

CASE STUDY: The C++ Memory Model

Memory Model COS 597C, Fall 2010 60

 Semantic of trylock Yes, if the compiler reorders 
 code in T1

Thread 1 Thread 2 lock(l); X = 42; while (trylock(l) == success) unlock(l); assert( X==42 );

slide-61
SLIDE 61

CASE STUDY: The C++ Memory Model

Memory Model COS 597C, Fall 2010 61

 Semantic of trylock

Thread 1 Thread 2 X = 42; lock(l); while (trylock(l) == success) unlock(l); assert( X==42 );

We can use a fence, but it is 
 unfair for properly used trylock

slide-62
SLIDE 62

CASE STUDY: The C++ Memory Model

Memory Model COS 597C, Fall 2010 62

 Semantic of trylock

Thread 1 Thread 2 X = 42; lock(l); while (trylock(l) == success) unlock(l); assert( X==42 );

Solution: in C++ memory model, trylock does not guarantee to reveal anything about the state 


  • f the lock
slide-63
SLIDE 63

CASE STUDY: The JAVA Memory Model

 JAVA: the first language specification attempts to

incorporate memory model

 What JAVA should do?

 Define semantics of all programs  Support execution of untrusted “sandboxed” code

 Sequential consistency for DRF  Synchronization implemented using monitors

 Volatile  synchronized primitive

 JAVA memory model does not guarantee deadlock free

Memory Model COS 597C, Fall 2010 63

slide-64
SLIDE 64

CASE STUDY: The JAVA Memory Model

Memory Model COS 597C, Fall 2010 64

 JAVE bugs found

historically

 Detached thread  Double-checked locking

Helper helper; Helper getHelper() { if (helper==null) { synchronized(this) {

  • if (help==null)
  • helper=new Helper();

} } return helper; }

slide-65
SLIDE 65

Lessons Learnt from C++/JAVA

Memory Model COS 597C, Fall 2010 65

 SC for DRF is the minimal baseline  Specifying semantics for programs with data races is

extremely HARD

 Simple optimization may introduce unintended

consequences

 State-of-the-art is still broken

 Abandon shared memory?  Hardware co-designed with high-level memory models?  Any volunteer for fixing the whole thing?

slide-66
SLIDE 66

Conclusion

Memory Model COS 597C, Fall 2010 66

 Memory model is very important and confusing  Memory model specifies what hardware/compiler can do

and cannot do

 Sequential consistency is very intuitive yet prohibits

performance

 Relaxed memory models allows some optimization but

introduces programming complexity

 Don’t try to be clever, unless you are clever enough

slide-67
SLIDE 67

Advanced Topics

Memory Model COS 597C, Fall 2010 67

 Why threads cannot be implemented as a library  Ongoing projects:

 Deterministic Parallel JAVA (DPJ)  Functional languages  DeNoVo hardware project

slide-68
SLIDE 68

Pthreads

Memory Model COS 597C, Fall 2010 68

 Some open source pthread implementation

slide-69
SLIDE 69

References

Memory Model COS 597C, Fall 2010 69

 Boehm’s “Foundations of the C++ Concurrency Memory

Model”

 Pugh’s “Fixing the JAVA Memory Model”  Adve’s “Shared Memory Consistency Models: A Tutorial”  Dubois’ “Memory Access Buffering in Multiprocessors”  Bohem’s “Threads cannot be implemented as a library”