Stalking the Lost Write: Memory Visibility in Concurrent Java Jeff - - PowerPoint PPT Presentation

stalking the lost write memory visibility in concurrent
SMART_READER_LITE
LIVE PREVIEW

Stalking the Lost Write: Memory Visibility in Concurrent Java Jeff - - PowerPoint PPT Presentation

Stalking the Lost Write: Memory Visibility in Concurrent Java Jeff Berkowitz, New Relic QCon San Francisco, November 2014 The Computer We Imagine . . . CPU statement-1; statement-2; if (b) statement-3; while (cond) { statement-4;


slide-1
SLIDE 1

Stalking the Lost Write: Memory Visibility in Concurrent Java

Jeff Berkowitz, New Relic
 QCon San Francisco, November 2014

slide-2
SLIDE 2

The Computer We Imagine

CPU Memory

Write Read

. . . statement-1; statement-2; if (b) statement-3; while (cond) { statement-4; } . . .

slide-3
SLIDE 3

The Compiler We Imagine

x++;

  • y++;

mov mem.x, reg1 incr reg1 mov reg1, mem.x

  • mov mem.y, reg1

incr reg1 mov reg1, mem.y

Java Assembly Language

Typical assembly language - no particular CPU

slide-4
SLIDE 4

The Compiler We Imagine

x++;

  • y++;

mov mem.x, reg1 incr reg1 mov reg1, mem.x

  • mov mem.y, reg1

incr reg1 mov reg1, mem.y

Java Assembly Language*

Typical assembly language - no particular CPU

slide-5
SLIDE 5

The Compiler We Get

x++;

  • y++;

mov mem.x, reg1 mov mem.y, reg2

  • incr reg1

mov reg1, mem.x

  • incr reg2

mov reg2, mem.y

Java Assembly Language

slide-6
SLIDE 6

The Compiler We Get

x++;

  • y++;

mov mem.x, reg1 mov mem.y, reg2

  • incr reg1

mov reg1, mem.x

  • incr reg2

mov reg2, mem.y

Java Assembly Language

slide-7
SLIDE 7

The End Result

x++;

  • y++;

mov mem.x, reg1 mov mem.y, reg2

  • incr reg1

mov reg1, mem.x

  • incr reg2

mov reg2, mem.y

Java Assembly Language

rd.issue(x) rd.issue(y)

  • resp.mov(r1)

resp.mov(r2) incr r1 wr.async(r1, x)

  • incr r2

wr.async(r2, y)

Hardware Level

Typical micro operations - no particular CPU

slide-8
SLIDE 8

The End Result

x++;

  • y++;

mov mem.x, reg1 mov mem.y, reg2

  • incr reg1

mov reg1, mem.x

  • incr reg2

mov reg2, mem.y

Java Assembly Language

rd.issue(x) rd.issue(y)

  • resp.mov(r1)

resp.mov(r2) incr r1 wr.async(r1, x)

  • incr r2

wr.async(r2, y)

Hardware Level

Typical micro operations - no particular CPU

slide-9
SLIDE 9

The Multiprocessor We Imagine

CPU Memory

Write Read

CPU

Write Read

There are no caches or memory buffering here

slide-10
SLIDE 10

Code Example 1

void m1() { y = a; b = 1; } void m2() { x = b; a = 2; }

CPU 1 CPU 2 Possible outcomes for x and y?

int x, y, a, b; // all zero

slide-11
SLIDE 11

Possible Trace 1

Time Outcome: x == 1, y == 0

y = a b = 1 m1() x = b a = 2 m2()

slide-12
SLIDE 12

Possible Trace 2

Time Outcome: x == 0, y == 0

y = a b = 1 m1() x = b a = 2 m2()

slide-13
SLIDE 13

Possible Trace 3

Time Outcome: x == 0, y == 0

y = a b = 1 m1() x = b a = 2 m2()

slide-14
SLIDE 14

Possible Trace 4

Time Outcome: x == 0, y == 0

y = a b = 1 m1() x = b a = 2 m2()

slide-15
SLIDE 15

Possible Trace 5

Time Outcome: x == 0, y == 2

y = a b = 1 m1() x = b a = 2 m2()

slide-16
SLIDE 16

Is That It?

  • It looks like x or y must be 0 in the result
  • Makes sense: the first statement of m1()

grabs a 0, and so does the first statement of m2()

  • Is our reasoning correct?

void m1() { y = a; b = 1; } void m2() { x = b; a = 2; } int x, y, a, b; // all zero

slide-17
SLIDE 17

Surprisingly, No

Counterintuitively, the compiler can reverse the order void m1() { y = a; b = 1; } mov #1, mem.b mov mem.a, mem.y void m2() { x = b; a = 2; } mov #2, mem.a mov mem.b, mem.x

slide-18
SLIDE 18

Intuitive Trace

Time Outcome: x == 0, y == 0

y = a b = 1 m1() x = b a = 2 m2()

slide-19
SLIDE 19

Surprising Trace

Time Outcome: x == 1, y == 2

y = a b = 1 m1() x = b a = 2 m2()

slide-20
SLIDE 20

And It Gets Worse …

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

MESI protocol

slide-21
SLIDE 21

MESI Protocol

  • Widely-known cache coordination protocol
  • Acronym for cache line states:
  • Modified Exclusive Shared Invalid
  • Transfers cache-line “messages” between

processor caches

  • Typically coordinated by parallel signaling

“bus” within chip or single board

slide-22
SLIDE 22

MESI Example 1-1

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

a == 0 Cache line holding variable a, value 0 . . . a = 7 CPU 2 assigns to a

slide-23
SLIDE 23

MESI Example 1-2

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

a == 0 “Read/Invalidate” MESI control message

i

CPU 2 write value to store buffer a == 7 . . . etc . . .

slide-24
SLIDE 24

MESI Example 1-3

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

. . . etc . . .

i

a == 0 a == 7 MESI Response Data Flow a == 0 Deferred Invalidate

slide-25
SLIDE 25

MESI Example 1-4

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

. . . etc . . . a == 7 Eventual cache write
 (or not …) a == 0 Eventual Invalidate
 (or not …)

slide-26
SLIDE 26

MESI Example 2-1

Credit: http:/ /bit.ly/pjug2013-mckenney-parallel void m1() { A = 1; B = 1; } void m2() { while (B == 0) ; assert(A == 1); }

CPU 1 CPU 2

int A, B; // both zero

slide-27
SLIDE 27

MESI Example 2-2

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0

slide-28
SLIDE 28

MESI Example 2-3

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . a = 1

slide-29
SLIDE 29

MESI Example 2-4

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

a == 1 b == 0 “Read/Invalidate” MESI control message

slide-30
SLIDE 30

MESI Example 2-4

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

a == 1 b == 0 MESI read response a == 0

slide-31
SLIDE 31

MESI Example 2-5

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

a == 1 b == 0 “Read” message for b in flight while (b == 0) ; a == 0

slide-32
SLIDE 32

MESI Example 2-6

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

b = 1 a == 1 b == 0 while (b == 0) ; “Read” message for b in flight Write new value of b but store buffer is full a == 0

slide-33
SLIDE 33

MESI Example 2-6

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

b = 1 a == 1 b == 1 while (b == 0) ; “Read” message for b in flight Write b to cache bypassing store buffer a == 0

slide-34
SLIDE 34

MESI Example 2-7

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

. . . a == 1 b == 1 while (b == 0) ; “Read” message for b processed a == 0

slide-35
SLIDE 35

MESI Example 2-8

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

. . . a == 1 b == 1 while (b == 0) ; b == 1 “Read” response for value of b a == 0

slide-36
SLIDE 36

MESI Example 2-9

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

. . . a == 1 b == 1 assert (a == 1) b == 1 Assertion causes CPU 2 to read value of a a == 0

slide-37
SLIDE 37

MESI Example 2-9

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

. . . a == 1 b == 1 assert (a == 1) b == 1 a == 0 Cache supplies stale value of a

slide-38
SLIDE 38

MESI Example 2-10

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

. . . a == 1 b == 1 (assertion fail) b == 1 Processes invalidate message, but too late Assertion fails!

slide-39
SLIDE 39

Where Are We?

  • Some concurrent traces (“Good” traces) seem

much more intuitive than others void m1() { y = a; b = 1; }

y = a b = 1 m1() x = b a = 2 m2()

“Good” Trace

slide-40
SLIDE 40

Others Not So Much

  • “Bad” traces don’t correspond to any possible

sequential execution of the original statements void m1() { y = a; b = 1; }

b = 1 y = a m1() a = 2 x = b m2()

“Bad” Trace

slide-41
SLIDE 41

Sequential Consistency

  • “Good” traces correspond to some sequential

execution of the original language statements

  • The concept of some sequential execution

can be formalized as sequential consistency

  • r SC.
  • “Bad” traces can be prevented by specifying

rules allowing programmers to ensure their code is SC.

slide-42
SLIDE 42

Java Memory Model (JMM)

  • Early Java was broken
  • JMM introduced in Java 1.5 (2004)
  • Now section 17.4 and 17.5 of JLS
  • Based on the concept of a partial order
  • Most memory operations are unordered
  • Abstract Happens-before operator defines
  • rdering of specific memory operations
slide-43
SLIDE 43

Typical Rules from JMM

“Every memory operation on a given thread happens-before the next memory operation by the same thread in program order.”

  • “All memory operations prior to writing a

volatile variable on one thread happen-before a read of the same volatile from another thread.”

slide-44
SLIDE 44

Modified Example 1

void m1() { y = a; b = 1; } void m2() { x = b; a = 2; }

CPU 1 CPU 2

volatile int a, b, x, y;

y == a must be visible to any thread that can

  • bserve b == 1

x == b must be visible to any thread that can

  • bserve a == 2.
slide-45
SLIDE 45

Result

a = 2 x = b b = 1 y = a m2() on CPU 2 m1() on
 CPU 1

Surprising Trace Prevented

The two happens- before operations mean that if CPU 2 can

  • bserve b = 1, it must

also observe y = a.

  • The compiler and

runtime cooperate to prevent the non-SC trace from occurring.

no!

slide-46
SLIDE 46

Rights and Responsibilities

  • Programmer is responsible for ensuring the

presence of a happens-before between every pair of references to a given datum.

  • In exchange, JMM guarantees that

program behavior will be SC

  • Terminology: a missing happens-before is

called a data race.

slide-47
SLIDE 47

Ensuring Happens-Before

  • Single-threaded code naturally has H-Bs
  • To ensure H-Bs in concurrent code, use:
  • immutability (final) with safe publication
  • primitives (volatile, mutex, atomics)
  • concurrency-safe library classes
  • concurrency-safe frameworks and

programming models, e.g. Akka

slide-48
SLIDE 48

MESI Example 3-1

MESI Example 2 but modified with volatile void m1() { A = 1; B = 1; } void m2() { while (B == 0) ; assert(A == 1); }

CPU 1 CPU 2

volatile int a, b; // both zero

slide-49
SLIDE 49

MESI Example 3-2

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0

slide-50
SLIDE 50

MESI Example 3-3

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . a = 1

slide-51
SLIDE 51

MESI Example 3-4

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

a == 1 b == 0 “Read/Invalidate” MESI control message

slide-52
SLIDE 52

MESI Example 3-5

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

a == 1 b == 0 MESI read response a == 0

slide-53
SLIDE 53

MESI Example 3-6

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

a == 1 b == 0 “Read” message for b in flight while (b == 0) ; a == 0

slide-54
SLIDE 54

MESI Example 3-7

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

b = 1 a == 1 b == 0 while (b == 0) ; “Read” message for b in flight Write new value of b but store buffer is full a == 0

slide-55
SLIDE 55

MESI Example 3-8

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

b == 1 while (b == 0) ; “Read” message for b in flight a == 1 CHANGE: write to b forces a to cache

slide-56
SLIDE 56

MESI Example 3-9

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

while (b == 0) ; “Read” message processed b == 1 a == 1

slide-57
SLIDE 57

MESI Example 3-10

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

while (b == 0) ; b == 1 “Read” response for b b == 1 a == 1

slide-58
SLIDE 58

MESI Example 3-11

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

i

b == 1 assert (a == 1) b == 1 Assertion causes CPU 2 to read value of a a == 1

slide-59
SLIDE 59

MESI Example 3-12

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

a == 1 b == 1 assert (a == 1) b == 1 CHANGE: read value of a forces invalidate

i

Stalled for cache …

slide-60
SLIDE 60

MESI Example 3-13

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

a == 1 b == 1 assert (a == 1) b == 1 Stalled for cache … MESI “read” message issued for a

slide-61
SLIDE 61

MESI Example 3-14

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 a == 1 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

b == 1 b == 1 “Read” response for value of a a == 1 assert (a == 1) Stalled for cache …

slide-62
SLIDE 62

MESI Example 3-15

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

a == 1 b == 1 assert (a == 1) b == 1 Stalled for cache … Cache supplying a a == 1

slide-63
SLIDE 63

MESI Example 3-16

Memory CPU 2

Store Buffer Cache Invalidate Q

CPU 1

Store Buffer Cache Invalidate Q

b == 0 . . . etc . . . a == 1

CPU 1

Store Buffer Cache Invalidate Q

a == 1 b == 1 assert (a == 1) b == 1 Assertion passes! a == 1

slide-64
SLIDE 64

Where Are We?

  • Volatile is one way to express happens-

before relationships

  • Prevents reordering in the compilers
  • At runtime, JIT generates architecture-

specific opcodes to

  • prevent memory op reordering in hardware
  • prevent deferred processing in hardware
slide-65
SLIDE 65

Generalizing on the JMM…

  • Go
  • New language from Google
  • Memory model expressed in terms of

“happens-before” as in JMM.

  • Akka
  • Async framework for Java, Scala, …
  • Spec makes reference to JMM
slide-66
SLIDE 66

Other Languages

  • C
  • Explicit (compiler directives, asms)
  • C++
  • Interesting memory model in C++ 2011
  • Objective-C
  • Also low level, language-specific features
slide-67
SLIDE 67

And More Languages

  • C#
  • Similar to Java
  • Rust
  • Concurrent task abstraction (a lá Occam?);

No shared memory in “safe” code

  • Dalvik (Android virtual machine)
  • Historically broken (Stackoverflow post)
slide-68
SLIDE 68

Explicit Control in C

  • Compiler directives/annotations/asms to

prevent aggressive compiler reordering

  • Linux kernel: macros expand to explicit

memory barrier instructions void m1(void) { stmt-1; smp_mb(); stmt-2; }

slide-69
SLIDE 69

Summary

  • These issues affect all languages that

support programming with threads

  • Java community was ahead of the curve

in addressing them

  • Awareness wins - you may not program

against the JMM, but understanding it is powerful.

  • Keep learning - avoid “DIY” and use the

highest level tools you can.

slide-70
SLIDE 70

References

http:/ /bitly.com/bundles/pdxjjb/2

  • Contains all the “bit.ly” links

from this presentation

slide-71
SLIDE 71

THANK YOU

  • Java Agent team and so many others at

New Relic for attending my practice talks and providing feedback…

  • And everyone who has attended one

version or another of this talk.

slide-72
SLIDE 72

Q&A

  • Followed By
  • Lunch