Load-reserve / Store-conditional on POWER and ARM Peter Sewell - - PowerPoint PPT Presentation

load reserve store conditional on power and arm
SMART_READER_LITE
LIVE PREVIEW

Load-reserve / Store-conditional on POWER and ARM Peter Sewell - - PowerPoint PPT Presentation

Load-reserve / Store-conditional on POWER and ARM Peter Sewell (slides from Susmit Sarkar) 1 University of Cambridge June 2012 Correct implementations of C/C++ on hardware Can it be done? . . . on highly relaxed hardware? What is involved?


slide-1
SLIDE 1

Load-reserve / Store-conditional on POWER and ARM

Peter Sewell (slides from Susmit Sarkar)

1University of Cambridge

June 2012

slide-2
SLIDE 2

Correct implementations of C/C++ on hardware

Can it be done?

◮ . . . on highly relaxed hardware?

What is involved?

◮ Mapping new constructs to assembly ◮ Optimizations: which ones legal? Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 2 / 10

slide-3
SLIDE 3

Correct implementations of C/C++ on hardware

Can it be done?

◮ . . . on highly relaxed hardware? e.g. Power

What is involved?

◮ Mapping new constructs to assembly ◮ Optimizations: which ones legal? Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 2 / 10

slide-4
SLIDE 4

Implementing C/C++11 on POWER: Pointwise Mapping

C/C++11 Operation POWER Implementation

Store (non-atomic) Load (non-atomic) st ld

(From Paul McKenney and Raul Silvera)

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 3 / 10

slide-5
SLIDE 5

Implementing C/C++11 on POWER: Pointwise Mapping

C/C++11 Operation POWER Implementation

Store (non-atomic) Load (non-atomic) st ld Store relaxed Store release Store seq-cst st lwsync; st lwsync; st Load relaxed Load consume Load acquire Load seq-cst ld ld (and preserve dependency) ld; cmp; bc; isync sync; ld; cmp; bc; isync

(From Paul McKenney and Raul Silvera)

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 3 / 10

slide-6
SLIDE 6

Implementing C/C++11 on POWER: Pointwise Mapping

C/C++11 Operation POWER Implementation

Store (non-atomic) Load (non-atomic) st ld Store relaxed Store release Store seq-cst st lwsync; st lwsync; st Load relaxed Load consume Load acquire Load seq-cst ld ld (and preserve dependency) ld; cmp; bc; isync sync; ld; cmp; bc; isync Fence acquire Fence release Fence seq-cst lwsync lwsync sync

(From Paul McKenney and Raul Silvera)

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 3 / 10

slide-7
SLIDE 7

Implementing C/C++11 on POWER: Pointwise Mapping

C/C++11 Operation POWER Implementation

Store (non-atomic) Load (non-atomic) st ld Store relaxed Store release Store seq-cst st lwsync; st lwsync; st Load relaxed Load consume Load acquire Load seq-cst ld ld (and preserve dependency) ld; cmp; bc; isync sync; ld; cmp; bc; isync Fence acquire Fence release Fence seq-cst lwsync lwsync sync CAS relaxed CAS seq-cst loop: lwarx; cmp; bc exit; stwcx.; bc loop; exit: sync; loop: lwarx; cmp; bc exit; stwcx.; bc loop; isync; exit: . . . ...

(From Paul McKenney and Raul Silvera)

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 3 / 10

slide-8
SLIDE 8

Implementing C/C++11 on POWER: Pointwise Mapping

C/C++11 Operation POWER Implementation

Store (non-atomic) Load (non-atomic) st ld Store relaxed Store release Store seq-cst st lwsync; st lwsync; st Load relaxed Load consume Load acquire Load seq-cst ld ld (and preserve dependency) ld; cmp; bc; isync sync; ld; cmp; bc; isync Fence acquire Fence release Fence seq-cst lwsync lwsync sync CAS relaxed CAS seq-cst loop: lwarx; cmp; bc exit; stwcx.; bc loop; exit: sync; loop: lwarx; cmp; bc exit; stwcx.; bc loop; isync; exit: . . . ...

Is that mapping correct?

(From Paul McKenney and Raul Silvera)

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 3 / 10

slide-9
SLIDE 9

Implementing C/C++11 on POWER: Pointwise Mapping

C/C++11 Operation POWER Implementation

Store (non-atomic) Load (non-atomic) st ld Store relaxed Store release Store seq-cst st lwsync; st lwsync; sync; st Load relaxed Load consume Load acquire Load seq-cst ld ld (and preserve dependency) ld; cmp; bc; isync sync; ld; cmp; bc; isync Fence acquire Fence release Fence seq-cst lwsync lwsync sync CAS relaxed CAS seq-cst loop: lwarx; cmp; bc exit; stwcx.; bc loop; exit: sync; loop: lwarx; cmp; bc exit; stwcx.; bc loop; isync; exit: . . . ...

Answer: No!

(From Paul McKenney and Raul Silvera)

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 3 / 10

slide-10
SLIDE 10

Implementing C/C++11 on POWER: Pointwise Mapping

C/C++11 Operation POWER Implementation

Store (non-atomic) Load (non-atomic) st ld Store relaxed Store release Store seq-cst st lwsync; st sync; st Load relaxed Load consume Load acquire Load seq-cst ld ld (and preserve dependency) ld; cmp; bc; isync sync; ld; cmp; bc; isync Fence acquire Fence release Fence seq-cst lwsync lwsync sync CAS relaxed CAS seq-cst loop: lwarx; cmp; bc exit; stwcx.; bc loop; exit: sync; loop: lwarx; cmp; bc exit; stwcx.; bc loop; isync; exit: . . . ...

Is that mapping correct? Answer: Yes!

(From Paul McKenney and Raul Silvera)

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 3 / 10

slide-11
SLIDE 11

Implementing C/C++11 on POWER: Pointwise Mapping

C/C++11 Operation POWER Implementation

Store (non-atomic) Load (non-atomic) st ld Store relaxed Store release Store seq-cst st lwsync; st sync; st Load relaxed Load consume Load acquire Load seq-cst ld ld (and preserve dependency) ld; cmp; bc; isync sync; ld; cmp; bc; isync Fence acquire Fence release Fence seq-cst lwsync lwsync sync CAS relaxed CAS seq-cst loop: lwarx; cmp; bc exit; stwcx.; bc loop; exit: sync; loop: lwarx; cmp; bc exit; stwcx.; bc loop; isync; exit: . . . ...

Is that the only correct mapping? Answer: No!

(From Paul McKenney and Raul Silvera)

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 3 / 10

slide-12
SLIDE 12

Implementing C/C++11 on POWER: Pointwise Mapping

C/C++11 Operation POWER Implementation

Store (non-atomic) Load (non-atomic) st ld Store relaxed Store release Store seq-cst st lwsync; st sync; st Alternative sync; st; sync; Load relaxed Load consume Load acquire Load seq-cst ld ld (and preserve dependency) ld; cmp; bc; isync sync; ld; cmp; bc; isync ld; sync Fence acquire Fence release Fence seq-cst lwsync lwsync sync CAS relaxed CAS seq-cst loop: lwarx; cmp; bc exit; stwcx.; bc loop; exit: sync; loop: lwarx; cmp; bc exit; stwcx.; bc loop; isync; exit: . . . ...

All compilers must agree for separate compilation

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 3 / 10

slide-13
SLIDE 13

Machine Synchronisation Operations

x86: atomic synchronization operations, e.g. “atomic add”,“CAS”,. . . RISC-friendly alternative: Load-reserve/Store-conditional (aka LL/SC, larx/stcx and lwarx/stwcx, LDREX/STREX)

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 4 / 10

slide-14
SLIDE 14

Machine Synchronisation Operations

x86: atomic synchronization operations, e.g. “atomic add”,“CAS”,. . . RISC-friendly alternative: Load-reserve/Store-conditional (aka LL/SC, larx/stcx and lwarx/stwcx, LDREX/STREX) Can be used to implement CAS, atomic add, spinlocks, . . . Universal (like CAS) [Herlihy’93] (but no ABA problem) Atomic Addition loop: lwarx r, d; add r,v,r; stwcx r, d; bne loop; Informally, stwcx succeeds only if no other write to the same address since last lwarx, setting a flag iff it succeeds

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 4 / 10

slide-15
SLIDE 15

What is no write since . . . ? In machine time?

◮ Neither necessary, nor sufficient Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 5 / 10

slide-16
SLIDE 16

What is no write since . . . ? In machine time?

◮ Neither necessary, nor sufficient

Microarchitecturally (simplified): if cache-line

  • wnership not lost since last lwarx

(but we don’t want to model the microarchitecture...)

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 5 / 10

slide-17
SLIDE 17

Modeling “not lost since”

Abstractly: ownership chain modeled by building up coherence order Coherence: order relating stores to the same location (eventually linear) A stwcx succeeds only if it is (or at least, if it can become) coherence-next-to the write read from by lwarx . . . and no other write can later come in between

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 6 / 10

slide-18
SLIDE 18

Modeling “not lost since”

Abstractly: ownership chain modeled by building up coherence order Coherence: order relating stores to the same location (eventually linear) A stwcx succeeds only if it is (or at least, if it can become) coherence-next-to the write read from by lwarx . . . and no other write can later come in between Isolate key concept: write reaching coherence point —

◮ coherence is linear below this write, and no new edges will be added

below

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 6 / 10

slide-19
SLIDE 19

Coherence points and a successful stwcx

Atomic Addition loop: lwarx r, x; add r,3,r; stwcx r, x; bne loop; Coherence order for x:

b:W x=3 a:W x=2 i:W x=0 j:W x=1 c:W x=4

Suppose lwarx reads from the “a:W x:2”

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 7 / 10

slide-20
SLIDE 20

Coherence points and a successful stwcx

Atomic Addition loop: lwarx r, x; add r,3,r; stwcx r, x; bne loop; Coherence order for x:

b:W x=3 a:W x=2 i:W x=0 j:W x=1 c:W x=4

Suppose lwarx reads from the “a:W x:2” stwcx can succeed if this becomes possible:

writes that have reached coherence point

i:W x=0 j:W x=1 a:W x=2 d:W∗ x=5 c:W x=4 b:W x=3

Warning: stwcx can fail spuriously

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 7 / 10

slide-21
SLIDE 21

Load-reserve/store-conditional and ordering Same-thread load-reserve/store-conditionals ordered by program order If all memory accesses are l-r/s-c sequences Then: only SC behaviour But . . . normal loads/stores (to different addresses) not ordered; the l-r/s-c do not act as a barrier Confusion here led to Linux bug . . . bad barrier placement in atomic-add-return

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 8 / 10

slide-22
SLIDE 22

Correctness of the Mapping

Theorem: For any sane, non-optimising compiler following the mapping: DRF C/C++ prog POWER prog C/C++11 execution

  • bservations

POWER execution

  • bservations

C/C++11 semantics POWER semantics compilation

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 9 / 10

slide-23
SLIDE 23

Correctness of the Mapping

Theorem: For any sane, non-optimising compiler following the mapping: DRF C/C++ prog POWER prog C/C++11 execution

  • bservations

POWER execution

  • bservations

C/C++11 semantics POWER semantics compilation

Preserves memory accesses; Uses the mapping table; Respects the thread local semantics of C/C++, preserving dependencies

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 9 / 10

slide-24
SLIDE 24

Correctness of the Mapping

Theorem: For any sane, non-optimising compiler following the mapping: DRF C/C++ prog POWER prog C/C++11 execution

  • bservations

POWER execution

  • bservations

C/C++11 semantics POWER semantics compilation

From POWER trace, build key relations (happens-before, SC

  • rder)

Required properties from abs. machine properties If trace looks like it produces data race, build the C/C++ data race

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 9 / 10

slide-25
SLIDE 25

For details...

see Synchronising C/C++ and POWER, Sarkar et al., PLDI 2012 http://www.cl.cam.ac.uk/~pes20/cppppc-supplemental/ In the paper: A formal model of load-reserve/store-conditional (in Lem) An executable model with exploration tool (ppcmem) Simplifications to the C/C++11 lock model Models “tight” against each other: relaxing the Power model would make C/C++11 unimplementable

Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June 2012 10 / 10