Relaxed memory models No sequential consistency (SC) in chips today - - PowerPoint PPT Presentation

relaxed memory models
SMART_READER_LITE
LIVE PREVIEW

Relaxed memory models No sequential consistency (SC) in chips today - - PowerPoint PPT Presentation

Dynamic Synthesis for Relaxed Memory Models Feng Liu*, Nayden Nedev*, Nedyalko Prisadnikov , Martin Vechev, Eran Yahav *Princeton University, Sofia University, ETH Zurich, Technion 06/13/2012 PLDI 2012, Beijing Relaxed


slide-1
SLIDE 1

Dynamic Synthesis for Relaxed Memory Models

Feng Liu*, Nayden Nedev*, Nedyalko Prisadnikov†, Martin Vechev§, Eran Yahav‡

*Princeton University, † Sofia University, § ETH Zurich, ‡ Technion

06/13/2012 PLDI 2012, Beijing

slide-2
SLIDE 2

Relaxed memory models

  • No sequential consistency (SC) in chips today
  • Chip designers implement “relaxed memory” on different

architectures:

  • total store order (TSO)

Intel’s and AMD’s X86; SPARC

  • partial store order (PSO)

SPARC

  • PPC model

IBM’s PowerPC; ARM

slide-3
SLIDE 3

Modeling TSO & PSO Programs

  • Store Buffering

– FIFO queues (buffers) associated with threads – A store goes to a local buffer, not memory – Stores in buffers are flushed at non-deterministic times

  • Store Forwarding

– Satisfy loads from local buffer if possible

3

slide-4
SLIDE 4

PSO Example

4

H=0, Done=0 thread1 thread2 H=1; Done=1; while (!Done) { } assert(H= =1);

t1 Main Memory

t2

… …

H Done H Done 1 store flush 1 H=0 load Done=1 Fails on PSO

slide-5
SLIDE 5

Memory Fences

5

Memory fence is very expensive (10-100 cycles) Use only where necessary

H=0, Done=0 thread1 thread2 H=1; Fence; Done=1; while (!Done) { } assert(H= =1);

slide-6
SLIDE 6

Our Approach

6

FENDER

Dynamic Analysis & static fixing

C/C++ Program

P

Specification

S

Memory Model

M

Program P’ with Fences

P’ satisfies S under M

slide-7
SLIDE 7

A lock-free memory allocator

771 lines of C code 2699 lines of IR code

[1] M. Michael, “scalable lock-free dynamic memory allocation,” PLDI’04.

Challenge: Handling real-world concurrent programs

slide-8
SLIDE 8

Real-World Programs?

  • Exposing violations under relaxed memory models

– Violations occur rarely

  • Many possible fence placements

– Large programs

  • Written in C/C++ language

– Rather than program models

8

slide-9
SLIDE 9

Contributions

  • Demonic scheduler to expose violations

– Delay flushes of values from store buffer to main memory

  • Avoiding bad executions by adding fences

– Extracting ordering constraints from bad executions – Enforcing ordering constraints using fences

  • Parametric synthesis framework

– Different memory models

  • Evaluating fences required under different memory models

and correctness criteria

– Found redundant and missing fences – Linearizability on relaxed memory models – Handled real C/C++ programs

9

slide-10
SLIDE 10

Fender Framework – Support for concurrency and RMM

10

LLVM-GCC LLVM Interpreter Threading Demonic Scheduler Memory Model Concurrent C/C++ code Client .bc

  • ur extension

existing work

slide-11
SLIDE 11

Our work – Dynamic analysis

11

LLVM-GCC LLVM Interpreter Threading Demonic Scheduler Memory Model Specification Trace Analysis Concurrent C/C++ code Client trace .bc

  • ur extension

existing work

Order formula SAT Solver SAT assignment

slide-12
SLIDE 12

Our work – Implement memory fences

12

LLVM-GCC LLVM Interpreter Threading Demonic Scheduler Memory Model Specification Trace Analysis SAT Solver Fence Enforcement Concurrent C/C++ code Client trace Order formula .bc modified .bc

  • ur extension

existing work

Fixed bytecode & Fence location report SAT assignment

slide-13
SLIDE 13

Example

13

t1 Main Memory

t2

… …

H Done H Done H=0, Done=0 thread1 thread2 L1: H=1; L2: Done=1; L3: while (!Done) { } L4: assert(H==1); : :

slide-14
SLIDE 14

Interpretation on PSO

14

trace H=0, Done=0 thread1 thread2 L1: H=1; L2: Done=1; L3: while (!Done) { } L4: assert(H==1); : : c L1: Store H=1 L2: Store Done=1 L4: Load H c load store flush

t1 Main Memory

t2

… …

H Done H Done 1 1 L1 L2 L3: Load Done

slide-15
SLIDE 15

Interpretation on PSO

15

trace c L3 L1 L3 L2 c load store flush

t1 Main Memory

t2

… …

H Done H Done H=0, Done=0 thread1 thread2 L1: H=1; L2: Done=1; L3: while (!Done) { } L4: assert(H==1); : :

slide-16
SLIDE 16

Flush with a probability

16

trace c L3 L1 L2 L3 c load store flush

t1 Main Memory

t2

… …

H Done H Done 1 L1 H=0, Done=0 thread1 thread2 L1: H=1; L2: Done=1; L3: while (!Done) { } L4: assert(H==1); : :

slide-17
SLIDE 17

Execution trace

17

trace c L3 L1 L2 L3 L4 : c load store flush

t1 Main Memory

t2

… …

H Done H Done 1 L1 H=0, Done=0 thread1 thread2 L1: H=1; L2: Done=1; L3: while (!Done) { } L4: assert(H==1); : :

slide-18
SLIDE 18

Checking Specification

18

trace . . . . . . . . different executions c load store flush

slide-19
SLIDE 19

Repair one trace

19

trace . . . . L1 L2

  • rder predicate [L1, L2]
  • rder formula [L1, L2]  [C, D]  … for a single execution

x1 x2 : C D

t1 Main Memory

t2

… …

H Done H Done 1 L1 1 L2

slide-20
SLIDE 20

Repair all incorrect traces

20

. . . . trace . . . . One memory fence should be placed here Global formula to SAT solver: (x1  x2  ..)  (x1  x3  ..)  … different executions trace1 trace3 trace1 trace3 trace2

slide-21
SLIDE 21

Fix the program

21

. . . . H=0, Done=0 thread1 thread2 L1: H=1; Fence; L2: Done=1; L3: while(! Done) { } L4: assert(H==1); : :

slide-22
SLIDE 22

Evaluation - Benchmarks

22

Program Memory safety Operational Sequential consistency linearizability

TSO PSO TSO PSO TSO PSO

Chase-Lev WSQ

1 2 2 3

Cilk’s THE WSQ

1 3

  • FIFO WSQ

2 1 2

LIFO WSQ

1 1

Anchor WSQ

1 1

FIFO iWSQ

3

  • LIFO iWSQ

2

  • Anchor iWSQ

2

  • MS2 Queue

MSN Queue

1 1

LazyList Set Harris’s Set

1 1

Memory allocator

3 4 4

Work stealing queues Idempotent Work stealing queues Concurrent data structures Lock-free memory allocator

slide-23
SLIDE 23

Evaluation - Specifications

23

Program Memory safety Operational Sequential consistency linearizability

TSO PSO TSO PSO TSO PSO

Chase-Lev WSQ

1 2 2 3

Cilk’s THE WSQ

1 3

  • FIFO WSQ

2 1 2

LIFO WSQ

1 1

Anchor WSQ

1 1

FIFO iWSQ

3

  • LIFO iWSQ

2

  • Anchor iWSQ

2

  • MS2 Queue

MSN Queue

1 1

LazyList Set Harris’s Set

1 1

Memory allocator

3 4 4

slide-24
SLIDE 24

Evaluation - Memory models

24

Program Memory safety Operational Sequential consistency linearizability

TSO PSO TSO PSO TSO PSO

Chase-Lev WSQ

1 2 2 3

Cilk’s THE WSQ

1 3

  • FIFO WSQ

2 1 2

LIFO WSQ

1 1

Anchor WSQ

1 1

FIFO iWSQ

3

  • LIFO iWSQ

2

  • Anchor iWSQ

2

  • MS2 Queue

MSN Queue

1 1

LazyList Set Harris’s Set

1 1

Memory allocator

3 4 4

slide-25
SLIDE 25

Evaluation - number of memory fences

25

Program Memory safety Operational Sequential consistency linearizability

TSO PSO TSO PSO TSO PSO

Chase-Lev WSQ

1 2 2 3

Cilk’s THE WSQ

1 3

  • FIFO WSQ

2 1 2

LIFO WSQ

1 1

Anchor WSQ

1 1

FIFO iWSQ

3

  • LIFO iWSQ

2

  • Anchor iWSQ

2

  • MS2 Queue

MSN Queue

1 1

LazyList Set Harris’s Set

1 1

Memory allocator

3 4 4

slide-26
SLIDE 26

Conclusion

  • Demonic scheduler to expose violations
  • Avoiding bad executions by adding fences
  • Parametric synthesis framework
  • Evaluating fences required under different memory

models and correctness criteria

26

slide-27
SLIDE 27

27

Thanks! Q & A