stalking the lost write memory visibility in concurrent
play

Stalking the Lost Write: Memory Visibility in Concurrent Java Jeff - PowerPoint PPT Presentation

Stalking the Lost Write: Memory Visibility in Concurrent Java Jeff Berkowitz, New Relic QCon San Francisco, November 2014 The Computer We Imagine . . . CPU statement-1; statement-2; if (b) statement-3; while (cond) { statement-4;


  1. Stalking the Lost Write: Memory Visibility in Concurrent Java Jeff Berkowitz, New Relic 
 QCon San Francisco, November 2014

  2. The Computer We Imagine . . . CPU statement-1; statement-2; if (b) statement-3; while (cond) { statement-4; Read Write } . . . Memory

  3. The Compiler We Imagine Typical assembly language - no particular CPU Java Assembly Language mov mem.x, reg1 x++; incr reg1 mov reg1, mem.x � � � � � mov mem.y, reg1 � y++; incr reg1 mov reg1, mem.y

  4. The Compiler We Imagine Typical assembly language - no particular CPU Java Assembly Language* mov mem.x, reg1 x++; incr reg1 mov reg1, mem.x � � � � � mov mem.y, reg1 � y++; incr reg1 mov reg1, mem.y

  5. The Compiler We Get Java Assembly Language mov mem.x, reg1 x++; mov mem.y, reg2 � � incr reg1 � mov reg1, mem.x � � � y++; incr reg2 mov reg2, mem.y

  6. The Compiler We Get Java Assembly Language mov mem.x, reg1 x++; mov mem.y, reg2 � � incr reg1 � mov reg1, mem.x � � � y++; incr reg2 mov reg2, mem.y

  7. The End Result Typical micro operations - no particular CPU Java Assembly Language Hardware Level rd.issue(x) mov mem.x, reg1 rd.issue(y) x++; mov mem.y, reg2 � resp.mov(r1) � � incr reg1 resp.mov(r2) � mov reg1, mem.x incr r1 � wr.async(r1, x) � � y++; incr reg2 � mov reg2, mem.y incr r2 wr.async(r2, y)

  8. The End Result Typical micro operations - no particular CPU Java Assembly Language Hardware Level rd.issue(x) mov mem.x, reg1 rd.issue(y) x++; mov mem.y, reg2 � resp.mov(r1) � � incr reg1 resp.mov(r2) � mov reg1, mem.x incr r1 � wr.async(r1, x) � � y++; incr reg2 � mov reg2, mem.y incr r2 wr.async(r2, y)

  9. The Multiprocessor We Imagine There are no caches or memory buffering here CPU CPU Read Write Read Write Memory

  10. Code Example 1 int x, y, a, b; // all zero CPU 1 CPU 2 void m1() { void m2() { y = a; x = b; b = 1; a = 2; } } Possible outcomes for x and y?

  11. Possible Trace 1 Time m1() y = a b = 1 m2() x = b a = 2 Outcome: x == 1, y == 0

  12. Possible Trace 2 Time m1() y = a m2() x = b b = 1 a = 2 Outcome: x == 0, y == 0

  13. Possible Trace 3 Time m1() y = a m2() x = b a = 2 b = 1 Outcome: x == 0, y == 0

  14. Possible Trace 4 Time m2() x = b m1() y = a a = 2 b = 1 Outcome: x == 0, y == 0

  15. Possible Trace 5 Time m2() x = b a = 2 m1() y = a b = 1 Outcome: x == 0, y == 2

  16. Is That It? • It looks like x or y must be 0 in the result • Makes sense: the first statement of m1() grabs a 0, and so does the first statement of m2() • Is our reasoning correct? int x, y, a, b; // all zero void m1() { void m2() { y = a; x = b; b = 1; a = 2; } }

  17. Surprisingly, No Counterintuitively, the compiler can reverse the order void m1() { y = a; mov #1, mem.b b = 1; mov mem.a, mem.y } void m2() { x = b; mov #2, mem.a a = 2; mov mem.b, mem.x }

  18. Intuitive Trace Time m1() m2() y = a x = b b = 1 a = 2 Outcome: x == 0, y == 0

  19. Surprising Trace Time m1() m2() b = 1 a = 2 x = b y = a Outcome: x == 1, y == 2

  20. And It Gets Worse … CPU 1 Store Buffer Cache Invalidate Q Memory MESI protocol CPU 2 Store Buffer Cache Invalidate Q

  21. MESI Protocol • Widely-known cache coordination protocol • Acronym for cache line states: • Modified Exclusive Shared Invalid • Transfers cache-line “messages” between processor caches • Typically coordinated by parallel signaling “bus” within chip or single board

  22. MESI Example 1-1 CPU 1 Store Buffer Cache Invalidate Q Memory a == 0 Cache line holding CPU 2 assigns to a variable a , value 0 CPU 2 Store Buffer Cache Invalidate Q . . . a = 7

  23. MESI Example 1-2 CPU 1 Store Buffer Cache Invalidate Q Memory a == 0 i CPU 2 write value to “Read/Invalidate” store buffer MESI control message CPU 2 Store Buffer Cache Invalidate Q a == 7 . . . etc . . .

  24. MESI Example 1-3 CPU 1 Store Buffer Cache Invalidate Q Memory a == 0 i MESI Response Data Flow Deferred Invalidate CPU 2 Store Buffer Cache Invalidate Q a == 7 a == 0 . . . etc . . .

  25. MESI Example 1-4 CPU 1 Store Buffer Cache Invalidate Q Memory a == 0 Eventual cache write 
 Eventual Invalidate 
 (or not …) (or not …) CPU 2 Store Buffer Cache Invalidate Q a == 7 . . . etc . . .

  26. MESI Example 2-1 Credit: http:/ /bit.ly/pjug2013-mckenney-parallel int A, B; // both zero CPU 1 CPU 2 void m1() { void m2() { A = 1; while (B == 0) B = 1; ; } assert(A == 1); }

  27. MESI Example 2-2 CPU 1 Store Buffer Cache Invalidate Q Memory b == 0 CPU 2 Store Buffer Cache Invalidate Q a == 0

  28. MESI Example 2-3 CPU 1 Store Buffer Cache Invalidate Q Memory b == 0 . . . a = 1 CPU 2 Store Buffer Cache Invalidate Q a == 0

  29. MESI Example 2-4 CPU 1 CPU 1 Store Buffer Store Buffer Cache Cache Invalidate Q Invalidate Q Memory a == 1 a == 1 b == 0 b == 0 . . . etc . . . “Read/Invalidate” MESI control message CPU 2 Store Buffer Cache Invalidate Q a == 0 i

  30. MESI Example 2-4 CPU 1 CPU 1 Store Buffer Store Buffer Cache Cache Invalidate Q Invalidate Q Memory a == 1 a == 1 b == 0 b == 0 a == 0 . . . etc . . . MESI read response CPU 2 Store Buffer Cache Invalidate Q a == 0 i

  31. MESI Example 2-5 CPU 1 CPU 1 Store Buffer Store Buffer Cache Cache Invalidate Q Invalidate Q Memory a == 1 a == 1 b == 0 b == 0 a == 0 . . . etc . . . “Read” message for b in flight CPU 2 Store Buffer Cache Invalidate Q a == 0 i while ( b == 0) ;

  32. MESI Example 2-6 CPU 1 CPU 1 Store Buffer Store Buffer Cache Cache Invalidate Q Invalidate Q Memory a == 1 a == 1 b == 0 b == 0 a == 0 . . . etc . . . b = 1 “Read” message for b in Write new value of b but flight store buffer is full CPU 2 Store Buffer Cache Invalidate Q a == 0 i while ( b == 0) ;

  33. MESI Example 2-6 CPU 1 CPU 1 Store Buffer Store Buffer Cache Cache Invalidate Q Invalidate Q Memory a == 1 a == 1 b == 0 b == 1 a == 0 . . . etc . . . b = 1 “Read” message for b in Write b to cache flight bypassing store buffer CPU 2 Store Buffer Cache Invalidate Q a == 0 i while ( b == 0) ;

  34. MESI Example 2-7 CPU 1 CPU 1 Store Buffer Store Buffer Cache Cache Invalidate Q Invalidate Q Memory a == 1 a == 1 b == 0 b == 1 a == 0 . . . etc . . . . . . “Read” message for b processed CPU 2 Store Buffer Cache Invalidate Q a == 0 i while ( b == 0) ;

  35. MESI Example 2-8 CPU 1 CPU 1 Store Buffer Store Buffer Cache Cache Invalidate Q Invalidate Q Memory a == 1 a == 1 b == 0 b == 1 a == 0 . . . etc . . . . . . “Read” response for value of b CPU 2 Store Buffer Cache Invalidate Q a == 0 i b == 1 while ( b == 0) ;

  36. MESI Example 2-9 CPU 1 CPU 1 Store Buffer Store Buffer Cache Cache Invalidate Q Invalidate Q Memory a == 1 a == 1 b == 0 b == 1 a == 0 . . . etc . . . . . . Assertion causes CPU 2 to read value of a CPU 2 Store Buffer Cache Invalidate Q a == 0 i b == 1 assert ( a == 1)

  37. MESI Example 2-9 CPU 1 CPU 1 Store Buffer Store Buffer Cache Cache Invalidate Q Invalidate Q Memory a == 1 a == 1 b == 0 b == 1 a == 0 . . . etc . . . . . . Cache supplies stale value of a CPU 2 Store Buffer Cache Invalidate Q a == 0 i b == 1 assert ( a == 1)

  38. MESI Example 2-10 CPU 1 CPU 1 Store Buffer Store Buffer Cache Cache Invalidate Q Invalidate Q Memory a == 1 a == 1 b == 0 b == 1 . . . etc . . . . . . Processes invalidate Assertion fails! message, but too late CPU 2 Store Buffer Cache Invalidate Q a == 0 i b == 1 (assertion fail)

  39. Where Are We? • Some concurrent traces (“Good” traces) seem much more intuitive than others “Good” Trace m2() void m1() { x = b m1() y = a; y = a b = 1; } a = 2 b = 1

  40. Others Not So Much • “Bad” traces don’t correspond to any possible sequential execution of the original statements “Bad” Trace m2() void m1() { a = 2 m1() y = a; b = 1 b = 1; } x = b y = a

  41. Sequential Consistency • “Good” traces correspond to some sequential execution of the original language statements • The concept of some sequential execution can be formalized as sequential consistency or SC . • “Bad” traces can be prevented by specifying rules allowing programmers to ensure their code is SC.

  42. Java Memory Model (JMM) • Early Java was broken • JMM introduced in Java 1.5 (2004) • Now section 17.4 and 17.5 of JLS • Based on the concept of a partial order • Most memory operations are unordered • Abstract Happens-before operator defines ordering of specific memory operations

  43. Typical Rules from JMM “Every memory operation on a given thread happens-before the next memory operation by the same thread in program order.” � “All memory operations prior to writing a volatile variable on one thread happen-before a read of the same volatile from another thread.”

  44. Modified Example 1 volatile int a, b, x, y; CPU 1 CPU 2 void m1() { void m2() { y == a must be visible y = a; x = b; x == b must be visible to any thread that can to any thread that can b = 1; a = 2; observe b == 1 observe a == 2. } }

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend