Relaxed Systems Architecture: Instruction Fetching Ben Simner - - PowerPoint PPT Presentation

relaxed systems architecture instruction fetching ben
SMART_READER_LITE
LIVE PREVIEW

Relaxed Systems Architecture: Instruction Fetching Ben Simner - - PowerPoint PPT Presentation

Relaxed Systems Architecture: Instruction Fetching Ben Simner University of Cambridge In collaboration with Shaked Flur, Christopher Pulte, Alasdair Armstrong, Jean Pichon, Luc Maranget 1 and Peter Sewell 1 INRIA Paris 1/41 Motivation Why?


slide-1
SLIDE 1

1/41

Relaxed Systems Architecture: Instruction Fetching Ben Simner University of Cambridge

In collaboration with Shaked Flur, Christopher Pulte, Alasdair Armstrong, Jean Pichon, Luc Maranget1 and Peter Sewell

1INRIA Paris

slide-2
SLIDE 2

2/41

Motivation

Why?

Want to understand: TLBs, Instruction Caches, Interrupts Want to prove: Operating Systems, JITs, Hypervisors

slide-3
SLIDE 3

3/41

But first. . . Computers are fast. . . . . . but terrible!

slide-4
SLIDE 4

4/41

Intel (Skylake) die

2

2Source: https://en.wikichip.org/wiki/intel/microarchitectures/

skylake_(client)

slide-5
SLIDE 5

5/41

Intel (Skylake) die

slide-6
SLIDE 6

5/41

Intel (Skylake) die

slide-7
SLIDE 7

5/41

Intel (Skylake) die

slide-8
SLIDE 8

5/41

Intel (Skylake) die

slide-9
SLIDE 9

6/41

x86: Observable complexity

Dekker’s/Peterson’s mutual exclusion algorithm (extract) Thread A flagA ← 1; while flagB {}; print(“A”) Thread B flagB ← 1; while flagA {}; print(“B”) x86 hardware can execute both prints!

slide-10
SLIDE 10

7/41

x86: TSO Architecture

flagA = 1 Store Buffer flagB = 1 Store Buffer flagA = 0 flagB = 0

. . . RAM Source Code

Thread A flagA ← 1; print(flagB) Thread B flagB ← 1; print(flagA)

Model CPU0 CPU1 . . . . . .

slide-11
SLIDE 11

8/41

State of the Art

Models: ◮ Abstract Hardware Operational ◮ Axiomatic-Style

slide-12
SLIDE 12

9/41

x86-TSO: Operational Semantics

◮ State = Abstracted Machine State m :

  • M : addr → value;

B : tid → (addr × value) list;

  • ◮ Structural Operational Semantics

m m′

t : Wx = v m′ = m with B := m.B ⊕ (t → ((x, v) : m.B t))

WB

slide-13
SLIDE 13

10/41

x86-TSO: Axiomatic-Style

Source Code

x ← 1; print(y) y ← 1; print(x)

Potential Execution #1 W x=1 R y=0 W y=1 R x=1 Potential Execution #2 W x=1 R y=1 W y=1 R x=0

. . .

slide-14
SLIDE 14

11/41

A Candidate Execution

Pre-execution = Set of Events + Induced Binary Relations (po/data/addr) Candidate = Pre-execution + Existentially Quantified Relations (co/rf) Allowed Execution W x=1 R y=0 W y=1 R x=1 po rf rf po po = Program-Order rf = Reads-From Definition of a valid Candidate (“Axiomatic Model”):

poWR = po ∩ (W × R) uniproc = po-loc ∪ (po \ poWR) fr = rf−1 ; co tso = rf ∪ fr ∪ co axiom : acyclic (uniproc ∪ tso)

slide-15
SLIDE 15

12/41

TSO: Forbidden Execution

Forbidden Execution W x=1 W y=1 R y=1 R x=0 po rf rf po fr po = Program-Order rf = Reads-From fr = From-Reads

Axiomatic Model:

poWR = po ∩ (W × R) uniproc = po-loc ∪ (po \ poWR) fr = rf−1 ; co tso = rf ∪ fr ∪ co axiom : acyclic (uniproc ∪ tso)

slide-16
SLIDE 16

13/41

TSO: Allowed Execution

Allowed Execution W x=1 R y=0 W y=1 R x=0 po fr rf rf po fr po = Program-Order rf = Reads-From fr = From-Reads

Axiomatic Model:

poWR = po ∩ (W × R) uniproc = po-loc ∪ (po \ poWR) fr = rf−1 ; co tso = rf ∪ fr ∪ co axiom : acyclic (uniproc ∪ tso)

slide-17
SLIDE 17

14/41

“user-mode” concurrency

Much work not covered here: ◮ Fences ◮ Atomics ◮ Mixed-size ◮ Multi-copy atomicity ◮ Other Architectures: IBM Power, Arm, RISC-V

slide-18
SLIDE 18

15/41

Systems Architecture Semantics

Pagetables and TLBs Instruction Fetch ESOP2020 Exceptions and Interrupts

with Ohad Kammar

Devices and NVME Future Work . . .

slide-19
SLIDE 19

16/41

JITs

Just-In-Time Compilation

CALL f CALL g CALL f

. . . Source Code

Jump 0x1000 Jump 0x2000

. . . Jump Table

Optimized code now unsound, have to re-compile!

. . .

f :

. . .

g :

. . . Compiled Code

slide-20
SLIDE 20

17/41

JITs

JIT: de-opt after executing g

CALL f CALL g CALL f

PC . . . Source Code

Jump 0x1000 Jump 0x2000

. . . Jump Table

Optimized code now unsound, have to re-compile!

. . .

f :

. . .

g :

. . . Compiled Code

slide-21
SLIDE 21

18/41

JITs

JIT: re-compile

CALL f CALL g CALL f

PC . . . Source Code

Jump 0x1000 Jump 0x2000 Jump 0x3000

. . . Jump Table

Optimized code now unsound, have to re-compile!

. . .

f :

. . .

g :

. . .

f :

Compiled Code

slide-22
SLIDE 22

19/41

ARMv8: How to safely modify code?

slide-23
SLIDE 23

20/41

RISC-V/x86/Power: How to?

Similar for IBM Power Much easier on x86 RISC-V not decided yet . . . Focus on ARMv8-A for rest of talk. . .

slide-24
SLIDE 24

21/41

An Instruction Fetching Test

Write f = “print(2)” CALL f

. . .

print(1) RETURN

. . . f : Memory Overwrite code of function f Then, Call f

slide-25
SLIDE 25

22/41

Real A64 Assembly

STR W0,[X1] BL f

Thread 0

f: B l0 l1: MOV X0,#2 RET l0: MOV X0,#1 RET

f

Initial state: 0:W0="B l1", 0:X1=f Allowed: 1:X0=1 Relaxed Result Observed in ~99% of experimental runs on multiple devices.

slide-26
SLIDE 26

23/41

An Architectural Model!

Write f = “print(2)” CALL f

. . .

print(1) RETURN

. . .

f :

Source Code per-thread Thread Fetch Queue Abstract icache

new fetch request

decode Abstract dcache Memory write data read data add to icache fetch global

Prefetching Stale instructons Data buffering

slide-27
SLIDE 27

23/41

An Architectural Model!

Write f = “print(2)” CALL f

. . .

print(1) RETURN

. . .

f :

Source Code per-thread Thread Fetch Queue Abstract icache

new fetch request

decode Abstract dcache Memory write data read data add to icache fetch

Prefetching Stale instructons Data buffering

slide-28
SLIDE 28

24/41

Unexpected Coherence!

f = “print(2)”

. . .

CALL f

. . .

print(1) print(f) RETURN

. . .

f :

Thread A Thread B

If f executes print(2) Then print(f) must print the updated memory (2).

slide-29
SLIDE 29

25/41

Real A64 Assembly

STR W0,[X1]

Thread 0

BL f LDR X1,[X2]

Thread 1

f: B l0 l1: MOV X0,#2 RET l0: MOV X0,#1 RET

f

Initial state: 0:W0="B l1", 0:X1=f, 1:X2=f Forbidden: 1:X0=2, 1:X1="B l0"

slide-30
SLIDE 30

26/41

Other Phenomena

Not Mentioned Here: ◮ (In)coherence ◮ Multiple images in I-cache ◮ Multiple images in D-cache(s) ◮ Direct Data Intervention ◮ Speculating cache maintenance ◮ O/S Migration ◮ and others . . .

slide-31
SLIDE 31

27/41

Operational Model

per-thread Thread Fetch Queue Abstract icache

new fetch request

decode Abstract dcache Memory write data read data add to icache fetch global

slide-32
SLIDE 32

28/41

Operational State

m :

  • ts : tid → instruction_tree

ss : storage_subsystem

  • storage_subsystem :
  • mem : write list

icache : tid → write set dcache : write list . . .

slide-33
SLIDE 33

29/41

Thread State

Sequential ISA Spec Explicit Speculation

slide-34
SLIDE 34

29/41

Thread State

Sequential ISA Spec Explicit Speculation Sequential ISA Spec

slide-35
SLIDE 35

29/41

Thread State

Sequential ISA Spec Explicit Speculation Sequential ISA Spec Explicit Speculation

slide-36
SLIDE 36

30/41

Operational: Transitions

Transitions: ◮ Step ISA Spec ◮ Memory Read/Write ◮ . . . ◮ Fetch Request ◮ Fetch Instruction (from icache) ◮ Decode Instruction ◮ . . . ◮ Update Instruction Cache ◮ Flow Writes into Memory ◮ Reset Instruction

*exact names my vary

New!

slide-37
SLIDE 37

31/41

Operational Rule (prose)

Flow Writes into Memory An instruction i in the state Perform_DC(address, state_cont) can complete if all po-previous DMB ISH and DSB ISH instructions have finished. Action:

  • 1. For the most recent writes ws which are in the same data

cache line of minimum size in the abstract data cache as address, update the memory with ws;

  • 2. Remove all those writes from the abstract data cache.
  • 3. Set the state of i to Plain(state_cont).
slide-38
SLIDE 38

32/41

Operational Rule (lem)

let flat_propagate_dc params state _cmr addr = (* remove all to that cacheline from buffer *) let (overlapping, fetch_buf) = List.partition (write_overlaps_with_addr (cache_line_fp addr)) state.flat_ss_fetch_buf in (* flow the overlapping writes into memory *) List.foldr (fun write state -> flat_write_to_memory params state write) (<| state with flat_ss_fetch_buf = fetch_buf |>)

  • verlapping
slide-39
SLIDE 39

33/41

RMEM

https://www.cl.cam.ac.uk/~pes20/rmem/

slide-40
SLIDE 40

34/41

Axiomatic-Style Model

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = rfe | fr | wco | irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Dependency-ordered-before *) let dob = addr | data | ctrl; [W] | (ctrl | (addr; po)); [ISB] | addr; po; [W] | (addr | data); rfi (* Atomic-ordered-before *) let aob = rmw | [range(rmw)]; rfi; [A|Q] (* Barrier-ordered-before *) let bob = [R|W]; po; [dmb.sy] | [dmb.sy]; po; [R|W] | [L]; po; [A] | [R]; po; [dmb.ld] | [dmb.ld]; po; [R|W] | [A|Q]; po; [R|W] | [W]; po; [dmb.st] | [dmb.st]; po; [W] | [R|W]; po; [L] | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] | [dmb.sy]; po; [DC] (* Cache-op-ordered-before *) let cob = [R|W]; (po&scl); [DC] | [DC]; (po&scl); [DC] (* Ordered-before *) let ob = obs|fob|dob|aob|bob|cob (* Internal visibility requirement *) acyclic (po-loc|fr|co|rf) as internal (* External visibility requirement *) acyclic ob as external (* Atomic *) empty rmw & (fre; coe) as atomic (* Constrained unpredictable *) let cff = ([W];loc;[IF]) \

  • b+−1 \ (co;iseq;ob+)

cff_bad cff ≡ CU

slide-41
SLIDE 41

35/41

Axiomatic-Style Model

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = rfe | fr | wco | irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Dependency-ordered-before *) let dob = addr | data | ctrl; [W] | (ctrl | (addr; po)); [ISB] | addr; po; [W] | (addr | data); rfi (* Atomic-ordered-before *) let aob = rmw | [range(rmw)]; rfi; [A|Q] (* Barrier-ordered-before *) let bob = [R|W]; po; [dmb.sy] | [dmb.sy]; po; [R|W] | [L]; po; [A] | [R]; po; [dmb.ld] | [dmb.ld]; po; [R|W] | [A|Q]; po; [R|W] | [W]; po; [dmb.st] | [dmb.st]; po; [W] | [R|W]; po; [L] | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] | [dmb.sy]; po; [DC] (* Cache-op-ordered-before *) let cob = [R|W]; (po&scl); [DC] | [DC]; (po&scl); [DC] (* Ordered-before *) let ob = obs|fob|dob|aob|bob|cob (* Internal visibility requirement *) acyclic (po-loc|fr|co|rf) as internal (* External visibility requirement *) acyclic ob as external (* Atomic *) empty rmw & (fre; coe) as atomic (* Constrained unpredictable *) let cff = ([W];loc;[IF]) \

  • b+−1 \ (co;iseq;ob+)

cff_bad cff ≡ CU

slide-42
SLIDE 42

36/41

Axiomatic ifetch: an example

STR W0,[X1] // (b) DC CVAU,X1 // (d) DSB ISH IC IVAU,X1 // (h) DSB ISH ISB // (l) BL f // (m) Thread 0 Initial state: W0="B l1" X1=f Forbidden: X0=1 fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0 fpo fpo fpo fpo fpo fpo po po po po po fe fe fe fe fe fe

slide-43
SLIDE 43

37/41

A Forbidden Instuction Fetch

fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0 fpo fpo fpo fpo fpo fpo po po po po po fe fe fe fe fe fe wco wco ifr

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Barrier-ordered-before *) let bob = . . . | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] (* Ordered-before *) let ob = obs | fob | bob (* External visibility requirement *) acyclic ob

slide-44
SLIDE 44

37/41

A Forbidden Instuction Fetch

fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0 fpo po po fe wco wco ifr

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Barrier-ordered-before *) let bob = . . . | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] (* Ordered-before *) let ob = obs | fob | bob (* External visibility requirement *) acyclic ob

slide-45
SLIDE 45

37/41

A Forbidden Instuction Fetch

fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0 fpo po po fe wco wco ifr

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Barrier-ordered-before *) let bob = . . . | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] (* Ordered-before *) let ob = obs | fob | bob (* External visibility requirement *) acyclic ob

slide-46
SLIDE 46

37/41

A Forbidden Instuction Fetch

fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0 fpo po po fe iseq ifr

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Barrier-ordered-before *) let bob = . . . | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] (* Ordered-before *) let ob = obs | fob | bob (* External visibility requirement *) acyclic ob

slide-47
SLIDE 47

37/41

A Forbidden Instuction Fetch

fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0 fpo po po fe iseq ifr

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Barrier-ordered-before *) let bob = . . . | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] (* Ordered-before *) let ob = obs | fob | bob (* External visibility requirement *) acyclic ob

slide-48
SLIDE 48

37/41

A Forbidden Instuction Fetch

fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0 fpo bob fe iseq ifr

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Barrier-ordered-before *) let bob = . . . | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] (* Ordered-before *) let ob = obs | fob | bob (* External visibility requirement *) acyclic ob

slide-49
SLIDE 49

37/41

A Forbidden Instuction Fetch

fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0 fpo bob fe iseq ifr

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Barrier-ordered-before *) let bob = . . . | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] (* Ordered-before *) let ob = obs | fob | bob (* External visibility requirement *) acyclic ob

slide-50
SLIDE 50

37/41

A Forbidden Instuction Fetch

fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0 bob fob iseq ifr

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Barrier-ordered-before *) let bob = . . . | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] (* Ordered-before *) let ob = obs | fob | bob (* External visibility requirement *) acyclic ob

slide-51
SLIDE 51

37/41

A Forbidden Instuction Fetch

fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0 bob fob iseq ifr

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Barrier-ordered-before *) let bob = . . . | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] (* Ordered-before *) let ob = obs | fob | bob (* External visibility requirement *) acyclic ob

slide-52
SLIDE 52

37/41

A Forbidden Instuction Fetch

fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0 bob fob

  • bs

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Barrier-ordered-before *) let bob = . . . | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] (* Ordered-before *) let ob = obs | fob | bob (* External visibility requirement *) acyclic ob

slide-53
SLIDE 53

37/41

A Forbidden Instuction Fetch

fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0 bob fob

  • bs

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Barrier-ordered-before *) let bob = . . . | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] (* Ordered-before *) let ob = obs | fob | bob (* External visibility requirement *) acyclic ob

slide-54
SLIDE 54

37/41

A Forbidden Instuction Fetch

fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0

  • b
  • b
  • b

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Barrier-ordered-before *) let bob = . . . | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] (* Ordered-before *) let ob = obs | fob | bob (* External visibility requirement *) acyclic ob

slide-55
SLIDE 55

37/41

A Forbidden Instuction Fetch

fetch a: write f=B l1 b: fetch c: DC d: fetch e: DSB f: fetch g: IC h: fetch i: DSB j: fetch k: ISB l: fetch f=B l0 m: Thread 0

  • b
  • b
  • b

let iseq = [W];(wco&scl);[DC]; (wco&scl);[IC] (* Observed-by *) let obs = irf | (ifr;iseq) (* Fetch-ordered-before *) let fob = [IF]; fpo; [IF] | [IF]; fe | [ISB]; fe−1; fpo (* Barrier-ordered-before *) let bob = . . . | [R|W|F|DC|IC]; po; [dsb.ish] | [dsb.ish]; po; [R|W|F|DC|IC] (* Ordered-before *) let ob = obs | fob | bob (* External visibility requirement *) acyclic ob

slide-56
SLIDE 56

38/41

Modelling Process

Create Model Write Tests Run Tests Talk to Architects

slide-57
SLIDE 57

39/41

Validation

Validating the model: ◮ approx. 35 hand-written tests. ◮ approx. 1500 auto-generated tests. Ran on multiple devices and compared results to our models: Found some hardware bugs; Many places hardware not as relaxed as architecture allows!

slide-58
SLIDE 58

40/41

Future

◮ Exceptions and Interrupts ◮ Pagetables and TLBs ◮ Devices, DMA, Non-Volatile Memory

slide-59
SLIDE 59

41/41

End

So far: ◮ Re-cap “relaxed-memory” / x86-TSO.

◮ Operational & “Axiomatic” models

◮ JIT usage ◮ Arm self-modifying code ◮ ARMv8 Architectural Operational Model ◮ ARMv8 Axiomatic Model ◮ Modelling and validation