Memory consistency models Computer Architecture J. Daniel Garca - - PowerPoint PPT Presentation

memory consistency models
SMART_READER_LITE
LIVE PREVIEW

Memory consistency models Computer Architecture J. Daniel Garca - - PowerPoint PPT Presentation

Memory consistency models Memory consistency models Computer Architecture J. Daniel Garca Snchez (coordinator) David Expsito Singh Francisco Javier Garca Blas ARCOS Group Computer Science and Engineering Department University Carlos


slide-1
SLIDE 1

Memory consistency models

Memory consistency models

Computer Architecture

  • J. Daniel García Sánchez (coordinator)

David Expósito Singh Francisco Javier García Blas

ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 1/50

slide-2
SLIDE 2

Memory consistency models Memory model

1

Memory model

2

Sequential consistency

3

Other consistency models

4

Use case: Intel

5

Conclusion

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 2/50

slide-3
SLIDE 3

Memory consistency models Memory model

Memory consistency

P1 P2 P3 P4 Memory

Memory consistency model:

Set of rules defining how the memory system processes memory operations from multiple processors. Contract between programmer and system. Determines which optimizations are valid on correct programs.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 3/50

slide-4
SLIDE 4

Memory consistency models Memory model

Memory model

Interface between program and its transformers.

Defines which values can be returned by a read operation.

The language’s memory model has implications for hardware.

Language C, C++, FORTRAN . . . Compiler Machine Code Hardware Executed Code

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 4/50

slide-5
SLIDE 5

Memory consistency models Memory model

Single processor memory model

P Memory STORE . . . LOAD . . . STORE . . . LOAD

Memory behavior model:

Memory operations happen in program

  • rder.

A read returns the value from the last write in program order.

Semantics defined by sequential program

  • rder:

Simple but constrained reasoning.

Solve data and control dependencies.

Independent operations may be executed in parallel. Optimizations preserve semantics.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 5/50

slide-6
SLIDE 6

Memory consistency models Sequential consistency

1

Memory model

2

Sequential consistency

3

Other consistency models

4

Use case: Intel

5

Conclusion

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 6/50

slide-7
SLIDE 7

Memory consistency models Sequential consistency P P2 P3 P4 P5 LOAD . . . STORE . . . LOAD Memory

A multiprocessor system is sequentially consistent if the result of any execution is the same that would be obtained if operations from all processors were executed in some sequential order, and operations from each individual processor appear in that sequence in the order established by the program. Leslie Lamport, 1979

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 7/50

slide-8
SLIDE 8

Memory consistency models Sequential consistency

Sequential Consistency: Constraints

Program order.

Memory operations from a program must be made visible to all processes in program order.

Atomicity.

Total execution order between processes must be consistent requiring that all operations are atomic.

All the operations that a processor does after it has seen the new value of a write are not visible to other processes until they have seen the value from that write.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 8/50

slide-9
SLIDE 9

Memory consistency models Sequential consistency

Atomicity

a=1 while(a==0) {} b=1 while(b==0) {} x=a

Non atomic writes:

Write on b could bypass to while loop and read from a would bypass the write.

X=0.

Atomic writes:

Sequential consistency is preserved.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 9/50

slide-10
SLIDE 10

Memory consistency models Sequential consistency

Sequential consistency constraints all memory

  • perations:

Write → Read. Write → Write. Read → Read, Read → Write.

Simple model to reason about parallel programs. But, simple single processor reorderings may violate sequential consistency model:

Hardware reordering to improve performance.

Write buffers, overlapped writes, . . .

Compiler optimizations apply transformations with memory operations reordering.

Scalar replacement, register allocation, instruction scheduling, . . .

Transformations by programmers, or refactoring tools also modify program semantics.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 10/50

slide-11
SLIDE 11

Memory consistency models Sequential consistency

Sequential consistency violation

flag1=0; flag2=0; flag1=1; if (flag2==0) { critical section } flag2=1; if (flag1==0) { critical section } assert(p1!=0 || p2!=0);

If caches use a write buffer:

Writes are delayed in buffer. Reads obtain the old value. Dekker Algorithm is no longer valid.

Dekker algorithm is the first known solution to the mutual exclusion problem.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 11/50

slide-12
SLIDE 12

Memory consistency models Sequential consistency

Program order

flag1=0; flag2=0; flag1=1; if (flag2==0) { critical section } flag2=1; if (flag1==0) { critical section } assert(p1!=0 || p2!=0); Write flag1, 1 Read flag2 ← 0 Write flag2, 1 Read flag1 ← ¿0?

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 12/50

slide-13
SLIDE 13

Memory consistency models Sequential consistency

Program order

flag=0; A=42; flag=1 while (flag!=1) {} X=A; Write flag, 42 Write flag, 1 Read flag ← 0 Read flag ← 1 Read A ← ¿0?

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 13/50

slide-14
SLIDE 14

Memory consistency models Sequential consistency

Conditions for sequential consistency

Sufficient conditions:

Each process issues memory operations in program

  • rder.

After issuing a write, the process that performed the issue waits for completions of write before issuing another

  • peration.

After issuing a read, the process that performed the issue waits for completion of read and for completion of the write of the value being read.

Wait for write propagation to all processes.

Very demanding conditions.

There might be necessary conditions that are less demanding.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 14/50

slide-15
SLIDE 15

Memory consistency models Other consistency models

1

Memory model

2

Sequential consistency

3

Other consistency models

4

Use case: Intel

5

Conclusion

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 15/50

slide-16
SLIDE 16

Memory consistency models Other consistency models

Optimizations

Models relaxing program execution order.

W → R. W → W. R → W, W → W.

Notation:

X → Y

Y bypasses X.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 16/50

slide-17
SLIDE 17

Memory consistency models Other consistency models

Reorderings

Processor R → R R → W W → R W → W Alpha

  • PA-RISC
  • POWER
  • SPARC
  • x86
  • AMD64
  • IA64
  • zSeries
  • cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 17/50

slide-18
SLIDE 18

Memory consistency models Other consistency models

Reads bypass writes (W→R)

A read may execute before a preceding write. Typical in systems with write buffer.

Check consistency with buffer. Allow read buffer.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 18/50

slide-19
SLIDE 19

Memory consistency models Other consistency models

Other models

R → W, W → R.

Allow that writes may arrive into memory out of program

  • rder.

R → W, W → R, R → R, W → W.

Avoid only data and control dependencies within processor. Alternatives:

Weak consistency. Release consistency.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 19/50

slide-20
SLIDE 20

Memory consistency models Other consistency models

Weak ordering

Divides memory operations into data operations and synchronization operations. Synchronization operations act as a barrier.

1 All preceding data operations in program order to a

synchronization must complete before synchronization is executed.

2 All subsequent data operations in program order to a

synchronization operation must wait until synchronization ins completed.

3 Synchronization are performed in program order.

Hardware implementation of barrier.

Processor keeps a counter:

Data operation issue ⇒ increment. Data operation completed ⇒ decrement.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 20/50

slide-21
SLIDE 21

Memory consistency models Other consistency models

Release/acquire consistency

More relaxed than weak consistency. Synchronization accesses divided into:

Acquire. Release.

Semantics:

Acquire

Must complete before all subsequent memory accesses.

Release

Must complete all previous memory accesses. Subsequent memory accesses MAY initiate. Operations following a release and must wait, must be protected with an acquire.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 21/50

slide-22
SLIDE 22

Memory consistency models Use case: Intel

1

Memory model

2

Sequential consistency

3

Other consistency models

4

Use case: Intel

5

Conclusion

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 22/50

slide-23
SLIDE 23

Memory consistency models Use case: Intel Consistency model

4

Use case: Intel Consistency model Examples Model effects

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 23/50

slide-24
SLIDE 24

Memory consistency models Use case: Intel Consistency model

Memory consistency in Intel

Until 2005 hand not completely clarified its memory consistency model.

Formalizing the model highly complex. Problems for language implementations (Java, C++, . . . ).

Currently the model is clarified and public.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 24/50

slide-25
SLIDE 25

Memory consistency models Use case: Intel Consistency model

Initial Intel model

i486 and Pentium:

Operations in program order.

Exception: Read misses bypass writes in write buffer only if all writes are cache hits. It is impossible that a read miss matches with a write.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 25/50

slide-26
SLIDE 26

Memory consistency models Use case: Intel Consistency model

Atomic operations

Since i486:

Read or write 1 byte. Read or write a 16-bit aligned word. Read or write a 32-bit aligned double word.

Since Pentium:

Read or write a 64-bit aligned quadword. Non-cached memory access that fits in 32 bit data bus.

Since P6:

Non aligned access to data of 16, 32 or 64 bits that fit in a cache line.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 26/50

slide-27
SLIDE 27

Memory consistency models Use case: Intel Consistency model

Bus blocking (I)

A processor may issue a signal to block the bus.

Other elements cannot access the bus.

Automatic bus blocking:

Instruction XCHG. Updating segment descriptors, page directory, and page table. Interrupt acceptance.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 27/50

slide-28
SLIDE 28

Memory consistency models Use case: Intel Consistency model

Bus blocking (II)

Bus software blocking:

Use LOCK prefix in:

Instructions for bit checking and modification (BTS, BTR, BTC). Exchange instructions (XADD, CMPXCHG, CMPXCHG8B). 1 operand arithmetic instructions (INC, DEC, NOT, NEG). 2 operand arithmetic-logic instructions (ADD, ADC, SUB, SBB, AND, OR, XOR).

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 28/50

slide-29
SLIDE 29

Memory consistency models Use case: Intel Consistency model

Barrier instructions

LFENCE:

Barrier for load operations. Every load preceding a LFENCE is globally made visible before any subsequent load.

SFENCE:

Barrier for store operations. Every store preceding a SFENCE is globally visible before any subsequent store.

MFENCE:

Barrier for load/store operations. All load and store preceding a MFENCE are globally visible before any subsequent load or store.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 29/50

slide-30
SLIDE 30

Memory consistency models Use case: Intel Consistency model

Current memory model within processor (I)

Reads do not bypass other reads (R → R). Writes do not bypass reads (R → W). Writes do not bypass writes (W → W).

There are exceptions for strings and non-temporal moves.

Reads bypass preceding writes (W → R) to different addresses. Reads/writes do not bypass I/O operations, locked instructions, or serializing instructions.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 30/50

slide-31
SLIDE 31

Memory consistency models Use case: Intel Consistency model

Current memory model within processor (II)

Reads cannot bypass preceding LFENCE or MFENCE. Reads cannot bypass preceding LFENCE, SFENCE, or MFENCE. LFENCE cannot bypass preceding read. SFENCE cannot bypass preceding write. MFENCE cannot bypass preceding read or write.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 31/50

slide-32
SLIDE 32

Memory consistency models Use case: Intel Consistency model

Multiprocessor memory model

Every processor is individually compliant with former rules. Writes from a processor are observed in the same order by all other processors. Writes from a processor are NOT ordered with respect to writes from other processors. Memory ordering is transitive. Two writes are viewed in a consistent order by any other processor distinct from those two processors. Lock instructions have a total order.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 32/50

slide-33
SLIDE 33

Memory consistency models Use case: Intel Examples

4

Use case: Intel Consistency model Examples Model effects

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 33/50

slide-34
SLIDE 34

Memory consistency models Use case: Intel Examples

Example: Write ordering

Processor A

write A.1 write A.2 write A.3

Processor B

write B.1 write B.2 write B.3

Processor C

write C.1 write C.2 write C.3 Writes from every processor keep

  • rder.

Possible order (I) Write A.1 Write B.1 Write B.2 Write C.1 Write A.2 Possible order (II) . . . Write B.3 Write A.3 Write C.2 Write C.3

Order for every process is kept. No order is guaranteed across processes.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 34/50

slide-35
SLIDE 35

Memory consistency models Use case: Intel Examples

No reordering R→R,W→W

Initial state X=0, Y=0 Processor 1

MOV [_x], 1 MOV [_y], 1

Processor 2

MOV r1, [_y] MOV r2, [_x]

State not allowed r1=1 y r2=0

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 35/50

slide-36
SLIDE 36

Memory consistency models Use case: Intel Examples

No reordering R→W

Initial state X=0, Y=0 Processor 1

MOV r1, [_x] MOV [_y], 1

Processor 2

MOV r2, [_x] MOV [_x], 1

State not allowed r1=1 y r2=1

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 36/50

slide-37
SLIDE 37

Memory consistency models Use case: Intel Examples

Reordering W(a)→R(b)

Initial state X=0, Y=0 Processor 1

MOV [_x], 1 MOV r1, [_y]

Processor 2

MOV [_y], 1 MOV r2, [_x]

State allowed r1=0 y r2=0

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 37/50

slide-38
SLIDE 38

Memory consistency models Use case: Intel Examples

No reordering W→R

Initial state X=0 Processor 1

MOV [_x], 1 MOV r1, [_x]

State not allowed r1=0

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 38/50

slide-39
SLIDE 39

Memory consistency models Use case: Intel Examples

Write visibility from other processor

Initial state X=0, Y=0 Processor 1

MOV [_x], 1 MOV r1, [_x] MOV r2, [_y]

Processor 2

MOV [_y], 1 MOV r3, [_y] MOV r4, [_x]

State allowed r2=0 y r4=0

Writes may be perceived in different order by every processor.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 39/50

slide-40
SLIDE 40

Memory consistency models Use case: Intel Examples

Transitive visibility of writes

Initial state X=0, Y=0 Processor 1

MOV [_x], 1

Processor 2

MOV r1, [_x] MOV [_y], 1

Processor 3

MOV r2, [_y] MOV r3, [_x]

State not allowed r1=1 y r2=1 y r3=0

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 40/50

slide-41
SLIDE 41

Memory consistency models Use case: Intel Examples

Consistent order of writes for all processors

Initial state X=0, Y=0

Processor 1

MOV [_x], 1

Processor 2

MOV [_y], 1

Processor 3

MOV r1, [_x] MOV r2, [_y]

Processor 4

MOV r3, [_y] MOV r4, [_x]

State not allowed r1=1 y r2=0 y r3=1 y r4=0

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 41/50

slide-42
SLIDE 42

Memory consistency models Use case: Intel Examples

Locked instructions define total order

Initial state r1=1, r2=1, X=0, Y=0

Processor 1

XCHG [_X], r1

Processor 2

XCHG [_y], r2

Processor 3

MOV r3, [_x] MOV r4, [_y]

Processor 4

MOV r5, [_y] MOV r6, [_x]

State not allowed r1=1 y r2=0 y r3=1 y r4=0

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 42/50

slide-43
SLIDE 43

Memory consistency models Use case: Intel Examples

Reads not reordered with locks

Initial state X=0, Y=0, r1=1, r3=1 Processor 1

XCHG [_x], r1 MOV r2, [_y]

Processor 2

XCHG [_y], r3 MOV r4, [_x]

State not allowed r2=0 y r4=0

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 43/50

slide-44
SLIDE 44

Memory consistency models Use case: Intel Examples

Writes not reordered with locks

Initial state X=0, Y=0, r1=1 Processor 1

XCHG [_x], r1 MOV [_y], r1

Processor 2

MOV r2, [_y] MOV r3, [_x]

State not allowed r2=1 y r3=0

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 44/50

slide-45
SLIDE 45

Memory consistency models Use case: Intel Model effects

4

Use case: Intel Consistency model Examples Model effects

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 45/50

slide-46
SLIDE 46

Memory consistency models Use case: Intel Model effects

Consistency models in Intel

Sequential consistency

Load: mov reg, [mem] Store: xchg [mem], reg

Relaxed consistency

Load: mov reg, [mem] Store: mov [mem], reg

Release/acquire consistency

Load: mov reg, [mem] Store: mov [mem], reg

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 46/50

slide-47
SLIDE 47

Memory consistency models Conclusion

1

Memory model

2

Sequential consistency

3

Other consistency models

4

Use case: Intel

5

Conclusion

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 47/50

slide-48
SLIDE 48

Memory consistency models Conclusion

Summary

Consistency memory model determines which

  • ptimizations are valid.

Sequential consistency establishes as constraints atomicity and program order. More relaxed models than sequential consistency can be used.

Weak consistency. Release/acquire consistency

Intel memory model has evolved over last decade.

Formalized and publicly available. Establishes what operations are atomic, when bus is blocked, and how barriers are defined. Defines the memory model within processor and between different processors.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 48/50

slide-49
SLIDE 49

Memory consistency models Conclusion

References

Computer Architecture. A Quantitative Approach. 5th Ed. Hennessy and Patterson. Sections: 5.6 Shared memory consistency models: A tutorial. Adve, S. V., and Gharachorloo, K. IEEE Computer 29, 12 (December 1996), 66-76. Intel 64 and IA-32 Architectures Software Developer Manuals. Volume 3: Systems Programming Guide. 8.2: Memory Ordering

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 49/50

slide-50
SLIDE 50

Memory consistency models Conclusion

Memory consistency models

Computer Architecture

  • J. Daniel García Sánchez (coordinator)

David Expósito Singh Francisco Javier García Blas

ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 50/50