CPL 2016, week 1 Java threads and inter-thread visibility Oleg - PowerPoint PPT Presentation

CPL 2016, week 1 Java threads and inter-thread visibility Oleg Batrashev Institute of Computer Science, Tartu, Estonia February 10, 2016

Course information ◮ 1 lecture, 1 lab each week ◮ moving lecture/lab time? (Monday 10-16) ◮ Languages (poll) ◮ Java 40% ◮ Erlang 20% ◮ Clojure 40% ◮ Total score consists of ◮ weekly homeworks 20% in total ◮ two small projects, 20% each ◮ 40% written exam ◮ I create lecture notes where add comments to slides

Agenda Java threads Creating threads Measuring time Inter-thread visibility Example: infinite loop Hardware: invalid read/write order Example: read/write access ordering Hardware: cache subsystem Java memory model Example: faulty publication Double-Checked Locking Object safe publication Conclusions

Extend Thread class 1. Extend your class from the Thread class, 2. Create an object of your class, 3. Start your thread by using start() method on the object. This may be the simplest method, but it is not very flexible. Using Runnable interface, as shown next is preferred way. ◮ How many of you know this?

Implement Runnable interface Create a class: class Reader implements Runnable { @Override public void run () { // new thread starts execution here } } Create thread object and pass it new class object: Thread t = new Thread(new Reader ()); t.start ();

System.nanoTime() ◮ measures wall-time (elapsed time) ◮ as opposed to user time or CPU time ◮ valid only within the same JVM ◮ much better accuracy than currentTimeInMillis() Example: long mainStartTime = System.nanoTime (); while (i<N) i++; t.join (); System.out.println("Time difference " +( readerStartTime - mainStartTime )); ◮ join() waits for the other thread to finish, which makes its values available, e.g. readerStartTime

CountDownLatch CountDownLatch may be used to synchronize threads[3, sec 5.5.1] latch = new CountDownLatch (1); Thread t = new Thread(new Reader ()); t.start (); latch.await (); ◮ await() method suspends until latch is zero ◮ countDown() method decreases the latch value public void run () { latch.countDown (); try { latch.await (); } catch ( InterruptedException e) { throw new RuntimeException (e); } readerStartTime = System.nanoTime ();

Visibility problems Single-threaded programs ◮ always see the last written value of a variable x=5 ... y=2*x Multi-threaded programs ◮ can see variable values written from other threads, ◮ moreover, there is uncertainty in the order the changes are seen in the current thread. There are several reasons why such situation can happen: 1. CPU registers 2. CPU instructions executing out-of-order 3. Compiler re-ordering optimizations

Example code Variable i is visible to the Main and Reader threads: private static final int N = 50000000; private static /* volatile */ int i = 0; Main thread writes the variable in the loop: Thread t = new Thread(new Reader ()); t.start (); while (i<N) i++; System.out.println("Main finished"); Reader thread reads it in a loop and exits: int cnt =0; while (i<N) { cnt ++; } System.out.println("Count "+cnt );

Memory hierarchy The hierarchy consists of the following layers: ◮ Registers inside the core (CPU) – contain the values of instructions being executed; ◮ Dedicated cache (Level 1) – each core has its own; ◮ Shared cache (e.g. Level 2) – common for all cores; ◮ Main memory (RAM) – just your memory. The following are typical sizes and access delays: Core Registers several KBytes < 1ns Level 1 cache 30-100 KBytes 1 − 2ns Level 2 cache several MBytes 5 − 15ns Main memory (RAM) several GBytes 100 − 300ns

Memory read/write ordering When running machine code, CPU (core) may CPU ◮ execute it out-of-order, thus write/read memory in different order. Registers The latter is mostly eliminated by load/store buffers: Store buffer Load buffer 1. all reads from cache are program ordered; 2. all write to cache are program ordered; 3. reads may be re-ordered before writes to different memory location; 4. there are exceptions as described in [1] (vol. 3, sec. 8.2 “Memory ordering”); Cache 5. memory fence instruction[5] must be used to disallow moving reads before writes. Memory It is complex and platform dependent! Programmer needs something simpler to grasp!

Example code private static volatile boolean isRunning = true; private static /* volatile only in b case */ int x = 0; private static /* volatile only in c case */ int y = 0; Writer thread writes x and y in order to increasing values: for (int i=0; i<N; i++) { x = i; y = i; } isRunning = false; Reader thread reads x and y in the opposite order: int xl , yl , cnt =0; do { yl = y; xl = x; if (xl < yl) cnt ++; } while (isRunning ); cnt tells how many times x < y .

Example analysis and results If operations are not re-ordered: ◮ Writer thread always x first, so in memory x ≥ y ◮ Reader thread may get variables from different iterations of the Writer thread ◮ y is read first, thus it must be from earlier iteration, thus x > y It must therefore never be that x < y . Test results are: x < y count (several runs) Machine1 Machine2 Visibility2a 984, 7487, 1179 2781, 37, 182 Visibility2b 286, 21307, 1015 80975, 255, 80330 Visibility2c 0, 0, 0 0, 0, 0 For tests a and b most probable explanation: ◮ compiler re-ordering optimizations caused different read and/or write order Reasoning about program behavior of two threads is difficult! Unless there are certain guarantees, e.g. Java Memory Model.

Ordering semantics of volatile In Java program: ◮ writes to a volatile variable may not be re-ordered with earlier writes (possibly to different variables) ◮ reads from a volatile variable may not be re-ordered with the latter reads (possibly from different variables) In our example: x = i; y = i; ◮ if y is volatile, it may not be ordered before write to x ; ◮ if x is volatile, no guarantees are given, i.e. write to y may be done before write to x.

Common misconceptions Consider typical cache sybsystem: ◮ several levels with 64-byte granularity (cache line); ◮ when not found in Level 1, it is copied from Level 2; ◮ when not found in cache, it is copied from Main memory; ◮ when done, the cache line is copied back from Level 1 to Level 2. It is common to think that: 1. because Level 1 cache is dedicated to core, it is possible for two cores to have their own version of a line; 2. to make all changed values visible to other threads it is required to flush core Level 1 cache to at least Level 2 cache. In fact, on modern systems both are false . [4]

Cache coherency Modern cache subsystems maintain coherence protocol, typically: ◮ only one core may hold a line in write mode, exclusively; ◮ many cores may share a line in read-only mode; ◮ when one requests a line in write mode others must invalidate the line. With such protocols only one version of a line exists in the subsystem, either 1. shared (cloned) among cores in read-only mode 2. or exclusive to single core in writeable mode. This is simplified description, actual cache coherency protocols, like MESIF or MOESI, are more complex.

Happens-before in single thread In single thread actions execute as if by program order: ◮ statement A happens-before B if program code contains them in this order ◮ “as if” allows to re-order as long as the result is the same as with program (sequential) ordering ◮ in extreme, first assignment in x=5; x=y; may be discarded ◮ notice, this affects what values other threads may see Program ordering does absolutely no guarantees: ◮ what are the relations between actions in different threads, ◮ i.e. what changes and when are seen in other threads. It is unknown whether and when A effects will be visible in the second thread even if B is visible. A C D B

Happens-before between threads On modern cache subsystems we know which particular write to a volatile preceeds a given read from this volatile. JMM states that: ◮ write to a volatile variable happens-before subsequent read of that variable; ◮ happens-before relation is transitive: A → B and B → C implies that A → C ; ◮ thus, any statements before write to volatile happen-before statements after the read from the volatile; ◮ including reading/writing other variables ◮ again, compiler and CPU may optimize, but the results must be as if executed this way. Java compiler and JVM must provide this behavior for our program. This frees from thinking about low-level visibility details of the hardware.

JMM for Visibility2c example Remember ordering example where we make variable y volatile. ◮ Assume y is read from the second thread just after writer thread writes value 5; ◮ because y is volatile there is happens-before relation between these write and read x=5 y=5 yl=y xl=x ◮ consequently, x=5 happens-before xl=x , i.e. effects of the write must be visible to the Reader thread ◮ however, x may be 6 or 7, because no ( happens-before ) relations are defined for the subsequent writes ◮ they may or may not be visible to the Reader thread

CPL 2016, week 1 Java threads and inter-thread visibility Oleg - PowerPoint PPT Presentation

CPL 2016, week 1 Java threads and inter-thread visibility Oleg Batrashev Institute of Computer Science, Tartu, Estonia February 10, 2016 Course information 1 lecture, 1 lab each week moving lecture/lab time? (Monday 10-16) Languages

14.54 International Trade Lecture 4: Exchange Economies 14.54 Week 3 Fall 2016 14.54 (Week 3)

CS32 - Week 1 Umut Oztok June 24, 2016 Umut Oztok CS32 - Week 1 Administration

INVESTOR PRESENTATION For the 53 week period ended 30 December 2016 Trading update 53 week 52

14.54 International Trade Lecture 10: Production Functions 14.54 Week 6 Fall 2016 14.54 (Week

14.54 International Trade Lecture 3: Preferences and Demand 14.54 Week 2 Fall 2016 14.54 (Week

Time Matters Week 7 Week 6 Prototyping + Needfinding Week 7 Week 8 Implementation Week 9

CPL 2016, week 11 Clojure immutable data structures Oleg Batrashev Institute of Computer

CPL 2016, week 13 Software transactional memory Oleg Batrashev Institute of Computer Science,

15-112 Fundamentals of Programming Week 1 - Lecture 5: Wrapping up 1st week + Intro to strings.

CS32 - Week 4 Umut Oztok Jul 15, 2016 Umut Oztok CS32 - Week 4 Outline Template Programming

Medusa A disassembler and something more... Angelin Njakasoa BOOZ LSE Summer Week 2016

www. velpaprojects .com Finishing your property the VELPA way Time plan Week 1 - 4 Week 5 - 8

14.54 International Trade Lecture 20: Trade Policy (I) Tariffs 14.54 Week 13 Fall 2016

CPL 2016, week 9 Erlang fault tolerance and distributed programming Oleg Batrashev Institute of

CPL 2016, week 12 Clojure large scale design Oleg Batrashev Institute of Computer Science,

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

PRESENTATION For the 26 week period ended 30 June 2017 Financial highlights 26 week 26 week

14.54 International Trade Lecture 8: Ricardian Trade Model 14.54 Week 5 Fall 2016

Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016

Islands of the Pacific Northwest One or Two Week Cruise Week 1: September 14 th 20 th Week 2:

14.54 International Trade Lecture 11: Specific Factors Model 14.54 Week 6 Fall 2016

Week 4 Create content. Drive traffic. Pre-sell MVP. Week 5 - Email marketing funnel. More

Menu Day Week 1 Week 2 Week 3 Week 4 Monday +Pork and Apple Casserole or +Meat Loaf or Lamb

CPL 2016, week 8 Erlang functional core and agents Oleg Batrashev Institute of Computer Science,