debugging and improving the c c 11 memory model
play

Debugging and improving the C/C++11 memory model Viktor Vafeiadis - PowerPoint PPT Presentation

Debugging and improving the C/C++11 memory model Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) January 2016 The C11 memory model Defines the semantics of concurrent memory accesses in C/C++. Standardised by ISO C/C++


  1. Debugging and improving the C/C++11 memory model Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) January 2016

  2. The C11 memory model Defines the semantics of concurrent memory accesses in C/C++. Standardised by ISO C/C++ 2011. Used: ◮ By several POPL/PLDI/OOPSLA papers ◮ Internally by LLVM IR ◮ Indirectly by every program 2

  3. The C11 memory model: Atomics Two types of locations Ordinary Atomic (Non-Atomic) Welcome to the Races are errors expert mode 3

  4. The C11 memory model: a spectrum of accesses Seq. consistent full memory fence Release write Acquire read no fence (x86); lwsync (PPC) no fence (x86); isync (PPC) Relaxed no fence Non-atomic no fence, races are errors Explicit primitives for fences 4

  5. An execution in C11: actions and relations (and axioms) na ( a , 0) W na ( x , 0) W rf po po na ( a , 5) R acq ( x , 0) W sw po po rel ( x , 1) W R acq ( x , 1) rf rf po hb � ( po ∪ sw ) + R na ( a , 5) Initially a = x = 0. a = 5; while ( x . load( acq ) == 0); x . store(1 , release ); print( a ); 5

  6. Relaxed behaviour: store buffering Initially x = y = 0. x . store(1 , rlx ); y . store(1 , rlx ); t 1 = y . load( rlx ); t 2 = x . load( rlx ); This can return t 1 = t 2 = 0. Justification [ x = y = 0] Behaviour observed on rlx ( x , 1) rlx ( y , 1) W W x86/Power/ARM R rlx ( y , 0) R rlx ( x , 0) 6

  7. Coherence Programs with a single shared variable behave as under SC. x . store(1 , rlx ); a = x . load( rlx ); x . store(2 , rlx ); b = x . load( rlx ); The outcome a = 2 ∧ b = 1 is forbidden. W rlx ( x , 1) R rlx ( x , 2) rlx ( x , 2) R rlx ( x , 1) W 7

  8. Coherence Programs with a single shared variable behave as under SC. x . store(1 , rlx ); a = x . load( rlx ); x . store(2 , rlx ); b = x . load( rlx ); The outcome a = 2 ∧ b = 1 is forbidden. W rlx ( x , 1) R rlx ( x , 2) mo x rlx ( x , 2) R rlx ( x , 1) W rb x ◮ Modification order, mo x , total order of writes to x . ◮ Reads-before : rb x � ( rf − 1 ; mo x ) ∩ ( � =) ◮ Coherence : hb ∪ rf x ∪ mo x ∪ rb x is acyclic for all x . 7

  9. Causality cycles with relaxed accesses Initially x = y = 0. if ( x . load ( rlx ) == 1) if ( y . load ( rlx ) == 1) y . store (1 , rlx ); x . store (1 , rlx ); C11 allows the outcome x = y = 1. Justification R rlx ( x , 1) R rlx ( y , 1) Relaxed accesses don’t synchronize W rlx ( y , 1) W rlx ( x , 1) 8

  10. No causality cycles with non-atomics Initially x = y = 0. if ( x == 1) if ( y == 1) y = 1; x = 1; C11 forbids the outcome x = y = 1. Justification Non-atomic read axiom: rf ∩ (_ × NA ) ⊆ hb 9

  11. Is the C11 memory model definition. . . 1. Mathematically sane? ◮ For example, it is monotone. 2. Not too weak? ◮ Provides useful reasoning principles. 3. Not too strong? ◮ Can be implemented efficiently. 4. Actually useful? ◮ Admits the intended program optimisations. 10

  12. Is the C11 memory model definition. . . 1. Mathematically sane? ◮ For example, it is monotone. 2. Not too weak? ◮ Provides useful reasoning principles. 3. Not too strong? ✓ Compilation to x86/Power/ARM. 4. Actually useful? ◮ Admits the intended program optimisations. 10

  13. Is the C11 memory model definition. . . 1. Mathematically sane? ◮ For example, it is monotone. 2. Not too weak? ≈ Reasoning principles for C11 subsets. 3. Not too strong? ✓ Compilation to x86/Power/ARM. 4. Actually useful? ◮ Admits the intended program optimisations. 10

  14. Is the C11 memory model definition. . . 1. Mathematically sane? ✗ No, it is not monotone. 2. Not too weak? ≈ Reasoning principles for C11 subsets. 3. Not too strong? ✓ Compilation to x86/Power/ARM. 4. Actually useful? ◮ Admits the intended program optimisations. 10

  15. Is the C11 memory model definition. . . 1. Mathematically sane? ✗ No, it is not monotone. 2. Not too weak? ≈ Reasoning principles for C11 subsets. 3. Not too strong? ✓ Compilation to x86/Power/ARM. 4. Actually useful? ✗ No, it disallows intended program transformations. 10

  16. Is the C11 memory model definition. . . 1. Mathematically sane? ✗ No, it is not monotone. 2. Not too weak? ≈ Reasoning principles for C11 subsets. 3. Not too strong? ≈ Compilation to x86/Power/ARM. 4. Actually useful? ✗ No, it disallows intended program transformations. 10

  17. Non-atomic reads of atomic variables are unsound! Initially, x = 0. if ( x . load( rlx ) == 1) x . store(1 , rlx ); t = (int) x ; The program can get stuck! W na ( x , 0) rlx ( x , 1) R rlx ( x , 1) W R na ( x , ? ) ◮ Reading 0 contradicts coherence. ◮ Reading 1 contradicts the non-atomic read axiom. 11

  18. Sequentialisation is invalid Initially, a = x = y = 0. if ( x . load( rlx ) == 1) if ( y . load( rlx ) == 1) a = 1; if ( a == 1) x . store(1 , rlx ); y . store(1 , rlx ); The only possible output is: a = 1 , x = y = 0 . Recall the non-atomic read axiom: rf ∩ (_ × NA ) ⊆ hb 12

  19. Tentative fixes Remove non-atomic read axiom. ◮ gives extremely weak guarantees, if any In addition, forbid ( hb ∪ rf )-cycles. ◮ rules out causal loops ◮ forbids some reorderings ◮ more costly on ARM/Power Or alternatively forbid ( hb ∪ rf )-cycles with NA accesses. ◮ allows more racy behaviours ◮ forbids some reorderings 13

  20. Tentative fixes Open problem Remove non-atomic read axiom. ◮ gives extremely weak guarantees, if any In addition, forbid ( hb ∪ rf )-cycles. ◮ rules out causal loops ◮ forbids some reorderings ◮ more costly on ARM/Power Or alternatively forbid ( hb ∪ rf )-cycles with NA accesses. ◮ allows more racy behaviours ◮ forbids some reorderings 13

  21. Monotonicity “Adding synchronisation should not introduce new behaviours” Examples: ◮ Reducing parallelism, C 1 � C 2 � C 1 ; C 2 ◮ Expression evaluation linearisation: � x = a + b ; t 1 = a ; t 2 = b ; x = t 1 + t 2 ; ◮ Adding a memory fence ◮ Strengthening the access mode of an operation ◮ (Roach motel reorderings) 14

  22. Other problems fixed (POPL’15, POPL’16) The axiom of SC reads is too weak. ◮ Makes strengthening unsound. The axioms of SC fences are too weak. ◮ They do not guarantee sequential consistency. The definition of release sequences is too strong. ◮ Removing ( po ∪ rf )-final events is unsound. 15

  23. Transformation correctness

  24. Valid instruction reorderings a ; b � b ; a (POPL’15) ↓ a \ b → R � = sc R sc W na W rlx W ⊒ rel C rlx | acq C ⊒ rel F acq F rel R na ✓ ✓ ✓ ✓ ✗ ✓ ✗ ✓ ✗ R rlx ✓ ✓ ✓ ( ✓ ) ✗ ( ✓ ) ✗ ✗ ✗ R ⊒ acq ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✗ W � = sc ✓ ✓ ✓ ✓ ✗ ✓ ✗ ✓ ✗ W sc ✓ ✗ ✓ ✓ ✗ ✓ ✗ ✓ ✗ C rlx | rel ✓ ✓ ✓ ( ✓ ) ✗ ( ✓ ) ✗ ✗ ✗ C ⊒ acq ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✗ F acq = ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ F rel ✓ ✓ ✓ ✗ ✓ ✗ ✓ ✓ = 17

  25. Redundant instruction eliminations (POPL’15) Overwritten write: x . store( v , M ) ; C ; x . store( v ′ , M ) C has no rel � C ; x . store( v ′ , M ) & no x accesses Read after write: x . store( v , M ) ; C ; t = x . load( M ′ ) C has no acq � x . store( v , M ) ; C ; t = v & no x accesses Read after read: t = x . load( M ) ; C ; t ′ = x . load( M ) C has no acq � t = x . load( M ) ; C ; t ′ = t & no x accesses 18

  26. Is DRF semantics really what we want?

  27. Should these transformations be allowed? 1. CSE over a lock acquire: t 1 = X ; t 1 = X ; � lock (); lock (); t 2 = X ; t 2 = t 1 ; If X changes in between, the program is racy. 2. Load hoisting: if( c ) t = X ; � r = X ; r = c ? t : r ; This may introduce a race, but the racy value is not used. 20

  28. Allowing both is clearly wrong! Consider the transformation sequence: if ( c ) t = X ; t = X ; r 1 = X ; r 1 = c ? t : r 1 ; r 1 = c ? t : r 1 ; � � lock (); lock (); lock (); r 2 = X ; r 2 = X ; r 2 = t ; When c is false, X is moved out of the critical region! So we have to forbid one transfomation. ◮ C11 forbids load hoisting, allows CSE over lock(). ◮ LLVM allows load hoisting, forbids CSE over lock(). 21

  29. Taming the release-acquire fragment

  30. Recall the spectrum of C11 access types Seq. consistent full memory fence Release write Acquire read no fence (x86); lwsync (PPC) no fence (x86); isync (PPC) Relaxed no fence Non-atomic no fence, races are errors 23

  31. C11’s release-acquire memory model C11 model where all reads are acquire, all writes are release, and all atomic updates are acquire/release Store buffering [ x = y = 0] x = y = 0 mo y mo x x := 1; y := 1; W x , 1 W y , 1 print y print x rf rf both threads may print 0 R y , 0 R x , 0 Message passing [ x = m = 0] x = m = 0 mo m mo x rf while x = 0 W m , 42 R x , 1 m := 42; skip ; rf x := 1 print m W x , 1 R m , 0 hb only 42 may be printed 24

  32. Good news ◮ Verified compilation schemes: ◮ x86-TSO (trivial compilation) [Batty el al. ’11] ◮ Power [Batty el al. ’12] [Sarkar el al. ’12] ◮ RA supports intended optimizations: ◮ In particular, write-read reordering (unlike SC): � W x → R y R y → W x ◮ DRF theorem: ◮ No data races under SC ensures no weak behaviors ◮ Monotonicity: ◮ Adding synchronization does not introduce new behaviors ◮ Program logics: ◮ RSL [Vafeiadis and Narayan ’13] ◮ GPS [Turon et al. ’14] ◮ OGRA [Lahav and Vafeiadis ’15] 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend