Verifying fence elimination optimisations Viktor Vafeiadis, MPI-SWS - PowerPoint PPT Presentation

Verifying fence elimination optimisations Viktor Vafeiadis, MPI-SWS Francesco Zappa Nardelli, INRIA http://www.cl.cam.ac.uk/~pes20/CompCertTSO

CompCertTSO LTL RTL branch tunnelling const prop. ClightTSO RTL LTL simplify linearize CSE C#minor RTL LTLin local vars reload/spill register Cstacked allocation Linear simplify act.records Cminor instruction selection Machabstr CminorSel Machconc x86 CFG generation [POPL 2011]

CompCertTSO + fence optimisations LTL RTL branch tunnelling const prop. ClightTSO LTL RTL simplify linearize CSE C#minor RTL LTLin local vars FE1 reload/spill Cstacked RTL Linear simplify PRE act.records Cminor RTL Machabstr instruction selection CminorSel FE2 RTL Machconc x86 CFG generation register allocation

Language semantics The semantics of all the CompCertTSO languages is defined by: – a type of programs, – a type of states, – a set of initial states for each program, – a transition relation, call , return , fail , oom , τ

Traces – Infinite sequences of call & return events; – Finite sequences of call & return events ending with: end : successful termination, inftau : infinite execution that stops performing visible events oom : execution runs out of memory NB: Erroneous computations become undefined after the first error.

Compiler correctness Compiler source program (e.g., C) target program (e.g., x86) traces(source_program) ⊇ traces(target_program) print “a” || print “b” print “ab” print “ab” print “a” || print “b” fail print “ab” print “ab” fail

Store buffering MOV [x] ← 1 MOV [y] ← 1 MOV EAX ← [y] MOV EBX ← [x] ... Thread Thread EAX : 32 EBX : 47 Write Write Buffer Buffer Shared Memory x : 0 y : 0 x : 0 y : 0

Store buffering MOV [x] ← 1 MOV [y] ← 1 MOV EAX ← [y] MOV EBX ← [x] ... Thread Thread EAX : 32 EBX : 47 Write Write Buffer Buffer x:1 Shared Memory x : 0 y : 0

Store buffering MOV [x] ← 1 MOV [y] ← 1 MOV EAX ← [y] MOV EBX ← [x] ... Thread Thread EAX : 32 EBX : 47 Write Write Buffer Buffer x:1 y:1 Shared Memory x : 0 y : 0

Store buffering MOV [x] ← 1 MOV [y] ← 1 MOV EAX ← [y] MOV EBX ← [x] ... Thread Thread EAX : 0 EBX : 0 Write Write Buffer Buffer y:1 Shared Memory x : 1 y : 0

Store buffering MOV [x] ← 1 MOV [y] ← 1 MOV EAX ← [y] MOV EBX ← [x] ... Thread Thread EAX : 0 EBX : 0 Write Write Buffer Buffer Shared Memory x : 1 y : 1

Store buffering + fences MOV [x] ← 1 MOV [y] ← 1 MFENCE MFENCE MOV EAX ← [y] MOV EBX ← [x] ... Thread Thread EAX : 32 EBX : 47 Write Write Buffer Buffer Shared Memory x : 0 y : 0

Store buffering + fences MOV [x] ← 1 MOV [y] ← 1 MFENCE MFENCE MOV EAX ← [y] MOV EBX ← [x] ... Thread Thread EAX : 32 EBX : 47 Write Write Buffer Buffer x:1 Shared Memory x : 0 y : 0

Store buffering + fences MOV [x] ← 1 MOV [y] ← 1 MFENCE MFENCE MOV EAX ← [y] MOV EBX ← [x] ... Thread Thread EAX : 32 EBX : 47 Write Write Buffer Buffer x:1 y:1 Shared Memory x : 0 y : 0

MFENCE blocks until the thread buffer is empty Store buffering + fences MOV [x] ← 1 MOV [y] ← 1 MFENCE MFENCE MOV EAX ← [y] MOV EBX ← [x] ... Thread Thread EAX : 32 EBX : 47 Write Write Buffer Buffer y:1 Shared Memory x : 1 y : 0

Who inserts fences? 1. The programmer , explicitly. Example: Fraser's lockfree-lib: /* * II. Memory barriers. * MB(): All preceding memory accesses must commit before any later accesses. * * If the compiler does not observe these barriers (but any sane compiler * will!), then VOLATILE should be defined as 'volatile'. */ #define MB() __asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory") 2. The compiler , to implement a high-level memory model, e.g. SEQ_CST C++0x low-level atomics on x86: Load SEQ_CST: MFENCE; MOV Store SEQ_CST: MOV; MFENCE

Fence instructions 1. Fences are necessary to implement locks & not fully-commutative linearizable objects (e.g., stacks, queues, sets, maps). [Attiya et al., POPL 2011] 2. Fences can be expensive

Redundant fences (1) If we have two consecutive fence instructions, we can remove the latter : MFENCE MFENCE MFENCE NOP The buffer is already empty when the second fence is executed. Generalisation: MFENCE MFENCE NON-WRITE INSTR NON-WRITE INSTR … … NON-WRITE INSTR NON-WRITE INSTR MFENCE NOP

A fence is redundant if it always follows a previous fence or locked instruction in program order, FE1 and no memory store instructions are in between. A forward data-flow problem over the boolean domain . Associate to each program point: ⊥ : along all execution paths there is an atomic instruction before the current program point, with no intervening writes; ⊤ : otherwise.

A fence is redundant if it always follows a previous fence or locked instruction in program order, FE1 and no memory store instructions are in between. A forward data-flow problem over the boolean domain . Associate to each program point: Implementation : ⊥ : along all execution paths there is an atomic instruction before the 1. Use CompCert implementation of Kildall algorithm current program point, with to solve the data-flow equations. no intervening writes; 2. Replace MFENCE s for which the analysis returns ⊥ ⊤ : otherwise. with NOP instructions.

Redundant fences (2) If we have two consecutive fence instructions, we can remove the former : MFENCE NOP MFENCE MFENCE Intuition: the visible effects initially published by the former fence, are now published by the latter, and nobody can tell the difference. Generalisation: MFENCE NOP ??? INSTRUCTION 1 INSTRUCTION 1 … … INSTRUCTION n INSTRUCTION n MFENCE MFENCE

Redundant fences (2) If there are reads in between the fences… Thread 0 Thread 1 MOV [x] ← 1 MOV [y] ← 1 EAX = EBX = 0 [x]=[y]=0 MFENCE forbidden MFENCE MOV EAX ← [y] MOV EBX ← [x] MFENCE but Thread 0 Thread 1 MOV [x] ← 1 EAX = EBX = 0 MOV [y] ← 1 [x]=[y]=0 NOP allowed MFENCE MOV EAX ← [y] MOV EBX ← [x] MFENCE

Redundant fences (2) If there are reads in between the fences… Thread 0 Thread 1 MOV [x] ← 1 MOV [y] ← 1 EAX = EBX = 0 [x]=[y]=0 MFENCE forbidden MFENCE MOV EAX ← [y] MOV EBX ← [x] MFENCE If there are reads in between, the optimisation is unsound. but Thread 0 Thread 1 MOV [x] ← 1 EAX = EBX = 0 MOV [y] ← 1 [x]=[y]=0 NOP allowed MFENCE MOV EAX ← [y] MOV EBX ← [x] MFENCE

Redundant fences (2) Swapping a STORE and a MFENCE is sound: MFENCE; STORE STORE; MFENCE 1. transformed program’s behaviours ⊆ source program’s behaviours (source program might leave pending write in its buffer) 2. There is the new intermediate state if the buffer was initially non- empty, but this intermediate state is not observable. (a local read is needed to access the local buffer) Intuition: Iterate this swapping ...

A fence is redundant if it always precedes a later fence or locked instruction in program order, FE2 and no memory read instructions are in between. A backward data-flow problem over the boolean domain . Associate to each program point: ⊥ : along all execution paths there is an atomic instruction after the current program point, with no intervening reads; ⊤ : otherwise.

Informal correctness argument Intuition : FE2 can be thought as iterating MFENCE; STORE STORE; MFENCE MFENCE; non-mem non-mem; MFENCE and then applying MFENCE; MFENCE NOP; MFENCE This argument works for finite traces , but not for infinite traces as the later fence might never be executed: MFENCE; NOP; STORE; STORE; WHILE(1); WHILE(1); MFENCE MFENCE

Basic simulations A pair of relations is a basic simulation for if: Exhibiting a basic simulation implies: traces ( compile ( p )) \ { t · inftau | t trace} ⊆ traces ( p ) “simulation can stutter forever”

Usual approach: measured simulations

Simulation for FE2 s ≡ i t iff thread i of s and t have identical pc, local states and buffers s ↝ i s' iff thread i of s can execute zero or more NOP , OP , STORE and MFENCE instructions and end in the state s' s ~ t iff – t’ s CFG is the optimised version of s’s CFG; and – s and t have identical memories; and – ∀ thread i , either s ≡ i t or the analysis for i’ s pc returned ⊥ and ∃ s ', s ↝ i s ' and s ' ≡ i t “ s is some instructions behind and can catch up ” Stutter condition : t > t ' iff t → t ' by a thread executing a NOP , OP , STORE or MFENCE ( and t ’s buffer being non-empty)

Verifying fence elimination optimisations Viktor Vafeiadis, MPI-SWS - PowerPoint PPT Presentation

Verifying fence elimination optimisations Viktor Vafeiadis, MPI-SWS Francesco Zappa Nardelli, INRIA http://www.cl.cam.ac.uk/~pes20/CompCertTSO CompCertTSO LTL RTL branch tunnelling const prop. ClightTSO RTL LTL simplify linearize CSE

Common BMP Deficiencies By: Bobbie Teixeira, Department of Health August 13, 2019 Sediment

Self- -Verifying Verifying Self Self-Verifying * * Dining Philosophers Dining Philosophers

Subsea Facilities Decommissioning Selected Practical Optimisations and Considerations

CLEMENTS HISTORIC DISTRICT CHARACTER DEFINING FEATURES TRIM BELOW ROOFLINE DOUBLE HUNG WINDOWS

Electric Fence Code Amendment February 28, 2018 Plan Commission Workshop #1 Background In 2015

IRONCLAD/MICALERT FENCE ALARM SYSTEM The easiest and most reliable fence sensor cable you ever

Silt Fence Installation Kody R. Featherston Candice L. Johnson J. Kent Evatt Monica L. Johnston

Rear fence at 39-88 44 th Street Project name: Rear fence at 39-88 44th Street, Sunnyside, NY

Silt Fence Installation Kody R. Featherston Candice L. Johnson J. Kent Evatt Monica L. Johnston

Dead Code Elimination & Dead code elimination Constant Propagation Conceptually similar

Second Order Cut-Elimination Mikheil Rukhaia Supervisor: Prof. Alexander Leitsch Introduction

Verifying Test Hypotheses - HOL/TestGen An Experiment in Test and Proof Thomas Malcher January

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of

Overview Verifying Continuous-Time Markov Chains Negative exponential distributions 1 Lecture

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

More On Paths Supplement to Chapter 4, Graph Theory Path definition What is a path? We

Program Verification Notes by Jonathan Buss Based in part on materials prepared by B.

Evolvable, Biologically Plausible Visual Architectures Aaron Sloman

Flow Bindings v03 draft-ietf-mext-flow-binding-03.txt George Tsirtsis Hesham(ed.), Nicolas,

TDDD04: Integration and System level testing Lena Buffoni lena.buffoni@liu.se Lecture plan

State Spaces & Partial-Order Planning AI Class 22 (Ch. 10 through 10.4.4 ) Material from Dr.

Pianola: A script-based I/O benchmark John May PSDW08, 17 November 2008 Lawrence Livermore

Verifying fence elimination optimisations Viktor Vafeiadis, MPI-SWS - PowerPoint PPT Presentation

Verifying fence elimination optimisations Viktor Vafeiadis, MPI-SWS Francesco Zappa Nardelli, INRIA http://www.cl.cam.ac.uk/~pes20/CompCertTSO CompCertTSO LTL RTL branch tunnelling const prop. ClightTSO RTL LTL simplify linearize CSE

Common BMP Deficiencies By: Bobbie Teixeira, Department of Health August 13, 2019 Sediment

Self- -Verifying Verifying Self Self-Verifying * * Dining Philosophers Dining Philosophers

Subsea Facilities Decommissioning Selected Practical Optimisations and Considerations

CLEMENTS HISTORIC DISTRICT CHARACTER DEFINING FEATURES TRIM BELOW ROOFLINE DOUBLE HUNG WINDOWS

Electric Fence Code Amendment February 28, 2018 Plan Commission Workshop #1 Background In 2015

IRONCLAD/MICALERT FENCE ALARM SYSTEM The easiest and most reliable fence sensor cable you ever

Silt Fence Installation Kody R. Featherston Candice L. Johnson J. Kent Evatt Monica L. Johnston

Rear fence at 39-88 44 th Street Project name: Rear fence at 39-88 44th Street, Sunnyside, NY

Silt Fence Installation Kody R. Featherston Candice L. Johnson J. Kent Evatt Monica L. Johnston

Dead Code Elimination &amp; Dead code elimination Constant Propagation Conceptually similar

Second Order Cut-Elimination Mikheil Rukhaia Supervisor: Prof. Alexander Leitsch Introduction

Verifying Test Hypotheses - HOL/TestGen An Experiment in Test and Proof Thomas Malcher January

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of

Overview Verifying Continuous-Time Markov Chains Negative exponential distributions 1 Lecture

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

More On Paths Supplement to Chapter 4, Graph Theory Path definition What is a path? We

Program Verification Notes by Jonathan Buss Based in part on materials prepared by B.

Evolvable, Biologically Plausible Visual Architectures Aaron Sloman

Flow Bindings v03 draft-ietf-mext-flow-binding-03.txt George Tsirtsis Hesham(ed.), Nicolas,

TDDD04: Integration and System level testing Lena Buffoni lena.buffoni@liu.se Lecture plan

State Spaces &amp; Partial-Order Planning AI Class 22 (Ch. 10 through 10.4.4 ) Material from Dr.

Pianola: A script-based I/O benchmark John May PSDW08, 17 November 2008 Lawrence Livermore

Dead Code Elimination & Dead code elimination Constant Propagation Conceptually similar

State Spaces & Partial-Order Planning AI Class 22 (Ch. 10 through 10.4.4 ) Material from Dr.