Weak memory models INF4140 - Models of concurrency Weak memory - PowerPoint PPT Presentation

Weak memory models

INF4140 - Models of concurrency Weak memory models Fall 2016 30. 10. 2016

Overview Weak memory models 1 Introduction 2 Hardware architectures Compiler optimizations Sequential consistency Weak memory models 3 TSO memory model (Sparc, x86-TSO) The ARM and POWER memory model The Java memory model Go memory model Summary and conclusion 4 3 / 87

Introduction

Concurrency Concurrency “Concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other” (Wikipedia) performance increase, better latency many forms of concurrency/parallelism: multi-core, multi-threading, multi-processors, distributed systems 5 / 87

Shared memory: a simplistic picture one way of “interacting” (i.e., communicating and synchronizing): via shared memory thread 0 thread 1 a number of threads/processors: access common memory/address space interacting by sequence of shared memory reads/writes (or loads/stores etc) However: considerably harder to get correct and efficient programs 6 / 87

Dekker’s solution to mutex As known, shared memory programming requires synchronization: e.g. mutual exclusion Dekker simple and first known mutex algo here simplified initially: flag 0 = flag 1 = 0 f l a g 0 := 1 ; f l a g 1 := 1 ; i f ( f l a g 1 = 0) i f ( f l a g 0 = 0) then CRITICAL then CRITICAL 7 / 87

Dekker’s solution to mutex As known, shared memory programming requires synchronization: e.g. mutual exclusion Dekker simple and first known mutex algo here simplified initially: flag 0 = flag 1 = 0 f l a g 0 := 1 ; f l a g 1 := 1 ; i f ( f l a g 1 = 0) i f ( f l a g 0 = 0) then CRITICAL then CRITICAL Known textbook “fact”: Dekker is a software-based solution to the mutex problem (or is it?) 8 / 87

Dekker’s solution to mutex As known, shared memory programming requires synchronization: e.g. mutual exclusion Dekker simple and first known mutex algo here simplified initially: flag 0 = flag 1 = 0 f l a g 0 := 1 ; f l a g 1 := 1 ; i f ( f l a g 1 = 0) i f ( f l a g 0 = 0) then CRITICAL then CRITICAL programmers need to know concurrency 9 / 87

A three process example Initially: x,y = 0 , r : register, local var thread 0 thread 1 thread 2 x := 1 if (x = 1) if (y = 1) then y:=1 then r:=x “Expected” result Upon termination, register r of the third thread will contain r = 1. 10 / 87

A three process example Initially: x,y = 0 , r : register, local var thread 0 thread 1 thread 2 x := 1 if (x = 1) if (y = 1) then y:=1 then r:=x “Expected” result Upon termination, register r of the third thread will contain r = 1. But: Who ever said that there is only one identical copy of x that thread 1 and thread 2 operate on? 11 / 87

Shared memory concurrency in the real world the memory architecture does not reflect reality thread 0 thread 1 out-of-order executions: 2 interdependent reasons: 1. modern HW: complex memory hierarchies, caches, buffers. . . shared memory 2. compiler optimizations, 12 / 87

SMP, multi-core architecture, and NUMA CPU 0 CPU 1 CPU 2 CPU 3 CPU 0 CPU 1 CPU 2 CPU 3 L 1 L 1 L 1 L 1 L 1 L 1 L 1 L 1 L 2 L 2 L 2 L 2 L 2 L 2 shared memory shared memory Mem. CPU 3 CPU 2 Mem. Mem. Mem. CPU 0 CPU 1 13 / 87

“Modern” HW architectures and performance p u b l i c c l a s s TASLock implements Lock { . . . p u b l i c void l o c k () { while ( s t a t e . getAndSet ( true )) { } // s p i n } . . . } p u b l i c c l a s s TTASLock implements Lock { . . . p u b l i c void l o c k () { while ( true ) { while ( s t a t e . get ( ) ) {}; // s p i n i f ( ! s t a t e . getAndSet ( true )) return ; } . . . } } 14 / 87

Observed behavior TASLock time TTASLock ideal lock number of threads (cf. [Anderson, 1990] [Herlihy and Shavit, 2008, p.470]) 15 / 87

Compiler optimizations many optimizations with different forms: elimination of reads, writes, sometimes synchronization statements re-ordering of independent, non-conflicting memory accesses introductions of reads examples constant propagation common sub-expression elimination dead-code elimination loop-optimizations call-inlining . . . and many more 16 / 87

Code reodering Initially: x = y = 0 Initially: x = y = 0 = ⇒ thread 0 thread 1 thread 0 thread 1 x := 1 y:= 1; r 1 := y y:= 1; r 1 := y r 2 := x; x := 1 r 2 := x; print r 1 print r 2 print r 1 print r 2 possible print-outs possible print-outs { ( 0 , 1 ) , ( 1 , 0 ) , ( 1 , 1 ) } { ( 0 , 0 ) , ( 0 , 1 ) , ( 1 , 0 ) , ( 1 , 1 ) } 17 / 87

Common subexpression elimination = ⇒ Initially: x = 0 Initially: x = 0 thread 0 thread 1 thread 0 thread 1 x := 1 r 1 := x; x := 1 r 1 := x; r 2 := x; r 2 := r 1 ; if r 1 = r 2 if r 1 = r 2 then print 1 then print 1 else print 2 else print 2 Is the transformation from the left to the right correct? thread 0 W [ x ] := 1 ; thread 1 R [ x ] = 1 ; R [ x ] = 1 ; print ( 1 ) thread 0 W [ x ] := 1 ; thread 1 R [ x ] = 0 ; R [ x ] = 1 ; print ( 2 ) thread 0 W [ x ] := 1 ; thread 1 R [ x ] = 0 ; R [ x ] = 0 ; print ( 1 ) thread 0 W [ x ] := 1 ; thread 1 R [ x ] = 0 ; R [ x ] = 0 ; print ( 1 ); 2nd prog: only 1 read from memory ⇒ only print(1) possible 18 / 87

Common subexpression elimination = ⇒ Initially: x = 0 Initially: x = 0 thread 0 thread 1 thread 0 thread 1 x := 1 r 1 := x; x := 1 r 1 := x; r 2 := x; r 2 := r 1 ; if r 1 = r 2 if r 1 = r 2 then print 1 then print 1 else print 2 else print 2 Is the transformation from the left to the right correct? transformation left-to-right ok transformation right-to-left: new observations, thus not ok 19 / 87

Compiler optimizations Golden rule of compiler optimization Change the code (for instance re-order statements, re-group parts of the code, etc) in a way that leads to better performance (at least on average), but is otherwise unobservable to the programmer (i.e., does not introduce new observable result(s)) 20 / 87

Compiler optimizations Golden rule of compiler optimization Change the code (for instance re-order statements, re-group parts of the code, etc) in a way that leads to better performance (at least on average), but is otherwise unobservable to the programmer (i.e., does not introduce new observable result(s)) when executed single-threadedly, i.e. without concurrency! :-O In the presence of concurrency more forms of “interaction” ⇒ more effects become observable standard optimizations become observable (i.e., “break” the code, assuming a naive, standard shared memory model) 21 / 87

Is the Golden Rule outdated? Golden rule as task description for compiler optimizers: Let’s assume for convenience, that there is no concurrency, how can I make make the code faster . . . . and if there’s concurrency? too bad, but not my fault . . . 22 / 87

Is the Golden Rule outdated? Golden rule as task description for compiler optimizers: Let’s assume for convenience, that there is no concurrency, how can I make make the code faster . . . . and if there’s concurrency? too bad, but not my fault . . . unfair characterization assumes a “naive” interpretation of shared variable concurrency (interleaving semantics, SMM) 23 / 87

Is the Golden Rule outdated? Golden rule as task description for compiler optimizers: Let’s assume for convenience, that there is no concurrency, how can I make make the code faster . . . . and if there’s concurrency? too bad, but not my fault . . . What’s needed: golden rule must(!) still be upheld but: relax naive expectations on what shared memory is ⇒ weak memory model DRF golden rule: also core of “data-race free” programming principle 24 / 87

Compilers vs. programmers Programmer Compiler/HW wants to understand � want to optimize the code code/execution (re-ordering memory ⇒ profits from strong accesses) memory models ⇒ take advantage of weak memory models = ⇒ What are valid (semantics-preserving) compiler-optimations? What is a good memory model as compromise between programmer’s needs and chances for optimization 25 / 87

Sad facts and consequences incorrect concurrent code, “unexpected” behavior Dekker (and other well-know mutex algo’s) is incorrect on modern architectures 1 in the three-processor example: r = 1 not guaranteed unclear/obstruse/informal hardware specifications, compiler optimizations may not be transparent understanding of the memory architecture also crucial for performance Need for unambiguous description of the behavior of a chosen platform/language under shared memory concurrency = ⇒ memory models 1 Actually already since at least IBM 370. 26 / 87

Memory (consistency) model What’s a memory model? “A formal specification of how the memory system will appear to the programmer, eliminating the gap between the behavior expected by the programmer and the actual behavior supported by a system.” [Adve and Gharachorloo, 1995] MM specifies: How threads interact through memory? What value a read can return? When does a value update become visible to other threads? What assumptions are allowed to make about memory when writing a program or applying some program optimization? 27 / 87

Weak memory models INF4140 - Models of concurrency Weak memory - PowerPoint PPT Presentation

Weak memory models INF4140 - Models of concurrency Weak memory models Fall 2016 30. 10. 2016 Overview Weak memory models 1 Introduction 2 Hardware architectures Compiler optimizations Sequential consistency Weak memory models 3 TSO

INF4140 - Models of concurrency RPC and Rendezvous INF4140 28 Oct. 2013 2 / 38 RPC and

Monitors (week 4) 2 / 44 INF4140 - Models of concurrency Monitors, lecture 4 Hsten 2013 16.

Active Objects INF4140 16.11.11 Lecture 12 INF4140 (16.11.11) Active Objects Lecture 12 1 /

Monitors INF4140 20.09.12 Lecture 4 0 Book: Andrews - ch.05 (5.1 - 5.2) INF4140 (20.09.12)

COMP31212: Concurrency Topics 4.1: Concurrency Patterns - Monitors Topic 4.1: Concurrency

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Concurrency Control Ensuring Isolation 354 Concurrency control Concurrency To increase

Concurrency What is concurrency? In computer science, concurrency is a property of systems which

INF4140 - Models of concurrency Hsten 2015 November 18, 2015 Abstract This is the

INF4140 - Models of concurrency Hsten 2014 November 24, 2014 Abstract This is the

INF4140 - Models of concurrency Hsten 2015 November 9, 2015 Abstract This is the

INF4140 - Models of concurrency Hsten 2015 October 12, 2015 Abstract This is the

Intro INF4140 - Models of concurrency Intro, lecture 1 Hsten 2015 24. 08. 2015 2 / 44

Locks & barriers INF4140 - Models of concurrency Locks & barriers, lecture 2 Hsten

INF4140 - Models of concurrency Hsten 2015 October 19, 2015 Abstract This is the

INF4140 - Models of concurrency Hsten 2015 August 24, 2015 Abstract This is the handout

Outline 0024 Spring 2010 9 :: 2 Reasoning about parallel programs 0024

Verifiedexec: An Introduction Brett Lymn Origins Idea formulated late last millenium A sudden

Sustainability the Environment / Green Economy - Sector Challenges and Strategies"

F unding Pr oje c ts! ! PE NNVE ST : Y $ $ MONE fo r No n Po int I nfra struc ture So

Sponsored by Thomson Reuters Housekeeping Todays webcast will last approximately one hour

The distinct() method IN TRODUCTION TO MON GODB IN P YTH ON Donny Winston Instructor An

DESY Report Status and selected topics Joachim Mnich (DESY) Plenary ECFA Meeting 19 July 2018

TAPPING THE POTENTIAL FOR REDUCING WORK-RELATED ROAD DEATHS AND INJURIES 23 October 2017,

Weak memory models INF4140 - Models of concurrency Weak memory - PowerPoint PPT Presentation

Weak memory models INF4140 - Models of concurrency Weak memory models Fall 2016 30. 10. 2016 Overview Weak memory models 1 Introduction 2 Hardware architectures Compiler optimizations Sequential consistency Weak memory models 3 TSO

INF4140 - Models of concurrency RPC and Rendezvous INF4140 28 Oct. 2013 2 / 38 RPC and

Monitors (week 4) 2 / 44 INF4140 - Models of concurrency Monitors, lecture 4 Hsten 2013 16.

Active Objects INF4140 16.11.11 Lecture 12 INF4140 (16.11.11) Active Objects Lecture 12 1 /

Monitors INF4140 20.09.12 Lecture 4 0 Book: Andrews - ch.05 (5.1 - 5.2) INF4140 (20.09.12)

COMP31212: Concurrency Topics 4.1: Concurrency Patterns - Monitors Topic 4.1: Concurrency

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Concurrency Control Ensuring Isolation 354 Concurrency control Concurrency To increase

Concurrency What is concurrency? In computer science, concurrency is a property of systems which

INF4140 - Models of concurrency Hsten 2015 November 18, 2015 Abstract This is the

INF4140 - Models of concurrency Hsten 2014 November 24, 2014 Abstract This is the

INF4140 - Models of concurrency Hsten 2015 November 9, 2015 Abstract This is the

INF4140 - Models of concurrency Hsten 2015 October 12, 2015 Abstract This is the

Intro INF4140 - Models of concurrency Intro, lecture 1 Hsten 2015 24. 08. 2015 2 / 44

Locks &amp; barriers INF4140 - Models of concurrency Locks &amp; barriers, lecture 2 Hsten

INF4140 - Models of concurrency Hsten 2015 October 19, 2015 Abstract This is the

INF4140 - Models of concurrency Hsten 2015 August 24, 2015 Abstract This is the handout

Outline 0024 Spring 2010 9 :: 2 Reasoning about parallel programs 0024

Verifiedexec: An Introduction Brett Lymn Origins Idea formulated late last millenium A sudden

Sustainability the Environment / Green Economy - Sector Challenges and Strategies&quot;

F unding Pr oje c ts! ! PE NNVE ST : Y $ $ MONE fo r No n Po int I nfra struc ture So

Sponsored by Thomson Reuters Housekeeping Todays webcast will last approximately one hour

The distinct() method IN TRODUCTION TO MON GODB IN P YTH ON Donny Winston Instructor An

DESY Report Status and selected topics Joachim Mnich (DESY) Plenary ECFA Meeting 19 July 2018

TAPPING THE POTENTIAL FOR REDUCING WORK-RELATED ROAD DEATHS AND INJURIES 23 October 2017,

Locks & barriers INF4140 - Models of concurrency Locks & barriers, lecture 2 Hsten

Sustainability the Environment / Green Economy - Sector Challenges and Strategies"