CPL 2016, week 1 Java threads and inter-thread visibility Oleg - - PowerPoint PPT Presentation
CPL 2016, week 1 Java threads and inter-thread visibility Oleg - - PowerPoint PPT Presentation
CPL 2016, week 1 Java threads and inter-thread visibility Oleg Batrashev Institute of Computer Science, Tartu, Estonia February 10, 2016 Course information 1 lecture, 1 lab each week moving lecture/lab time? (Monday 10-16) Languages
Course information
◮ 1 lecture, 1 lab each week
◮ moving lecture/lab time? (Monday 10-16)
◮ Languages (poll)
◮ Java 40% ◮ Erlang 20% ◮ Clojure 40%
◮ Total score consists of
◮ weekly homeworks 20% in total ◮ two small projects, 20% each ◮ 40% written exam
◮ I create lecture notes where add comments to slides
Agenda
Java threads Creating threads Measuring time Inter-thread visibility Example: infinite loop Hardware: invalid read/write order Example: read/write access ordering Hardware: cache subsystem Java memory model Example: faulty publication Double-Checked Locking Object safe publication Conclusions
Extend Thread class
- 1. Extend your class from the Thread class,
- 2. Create an object of your class,
- 3. Start your thread by using start() method on the object.
This may be the simplest method, but it is not very flexible. Using Runnable interface, as shown next is preferred way.
◮ How many of you know this?
Implement Runnable interface
Create a class:
class Reader implements Runnable { @Override public void run () { // new thread starts execution here } }
Create thread object and pass it new class object:
Thread t = new Thread(new Reader ()); t.start ();
System.nanoTime()
◮ measures wall-time (elapsed time)
◮ as opposed to user time or CPU time
◮ valid only within the same JVM ◮ much better accuracy than currentTimeInMillis()
Example:
long mainStartTime = System.nanoTime (); while (i<N) i++; t.join (); System.out.println("Time difference " +( readerStartTime
- mainStartTime ));
◮ join() waits for the other thread to finish, which makes its
values available, e.g. readerStartTime
CountDownLatch
CountDownLatch may be used to synchronize threads[3, sec 5.5.1]
latch = new CountDownLatch (1); Thread t = new Thread(new Reader ()); t.start (); latch.await (); ◮ await() method suspends until latch is zero ◮ countDown() method decreases the latch value public void run () { latch.countDown (); try { latch.await (); } catch ( InterruptedException e) { throw new RuntimeException (e); } readerStartTime = System.nanoTime ();
Visibility problems
Single-threaded programs
◮ always see the last written value of a variable x=5 ... y=2*x
Multi-threaded programs
◮ can see variable values written from other threads, ◮ moreover, there is uncertainty in the order the changes are
seen in the current thread. There are several reasons why such situation can happen:
- 1. CPU registers
- 2. CPU instructions executing out-of-order
- 3. Compiler re-ordering optimizations
Example code
Variable i is visible to the Main and Reader threads:
private static final int N = 50000000; private static /* volatile */ int i = 0;
Main thread writes the variable in the loop:
Thread t = new Thread(new Reader ()); t.start (); while (i<N) i++; System.out.println("Main finished");
Reader thread reads it in a loop and exits:
int cnt =0; while (i<N) { cnt ++; } System.out.println("Count "+cnt );
Memory hierarchy
The hierarchy consists of the following layers:
◮ Registers inside the core (CPU) – contain the values of
instructions being executed;
◮ Dedicated cache (Level 1) – each core has its own; ◮ Shared cache (e.g. Level 2) – common for all cores; ◮ Main memory (RAM) – just your memory.
The following are typical sizes and access delays: Registers Level 1 cache
< 1ns
Level 2 cache
1 − 2ns several KBytes
Main memory (RAM)
5 − 15ns 30-100 KBytes
Core
100 − 300ns several MBytes several GBytes
Memory read/write ordering
When running machine code, CPU (core) may
◮ execute it out-of-order, thus write/read
memory in different order. The latter is mostly eliminated by load/store buffers:
- 1. all reads from cache are program ordered;
- 2. all write to cache are program ordered;
- 3. reads may be re-ordered before writes to
different memory location;
- 4. there are exceptions as described in [1]
(vol. 3, sec. 8.2 “Memory ordering”);
- 5. memory fence instruction[5] must be used
to disallow moving reads before writes. It is complex and platform dependent! Programmer needs something simpler to grasp!
CPU Registers
Store buffer Load buffer
Cache Memory
Example code
private static volatile boolean isRunning = true; private static /* volatile
- nly in b case */ int x = 0;
private static /* volatile
- nly in c case */ int y = 0;
Writer thread writes x and y in order to increasing values:
for (int i=0; i<N; i++) { x = i; y = i; } isRunning = false;
Reader thread reads x and y in the opposite order:
int xl , yl , cnt =0; do { yl = y; xl = x; if (xl < yl) cnt ++; } while (isRunning );
cnt tells how many times x < y.
Example analysis and results
If operations are not re-ordered:
◮ Writer thread always x first, so in memory x ≥ y ◮ Reader thread may get variables from different iterations of
the Writer thread
◮ y is read first, thus it must be from earlier iteration, thus x > y
It must therefore never be that x < y. Test results are: x < y count (several runs) Machine1 Machine2 Visibility2a 984, 7487, 1179 2781, 37, 182 Visibility2b 286, 21307, 1015 80975, 255, 80330 Visibility2c 0, 0, 0 0, 0, 0 For tests a and b most probable explanation:
◮ compiler re-ordering optimizations caused different read
and/or write order Reasoning about program behavior of two threads is difficult! Unless there are certain guarantees, e.g. Java Memory Model.
Ordering semantics of volatile
In Java program:
◮ writes to a volatile variable may not be re-ordered with earlier
writes (possibly to different variables)
◮ reads from a volatile variable may not be re-ordered with the
latter reads (possibly from different variables) In our example:
x = i; y = i; ◮ if y is volatile, it may not be ordered before write to x; ◮ if x is volatile, no guarantees are given, i.e. write to y may be
done before write to x.
Common misconceptions
Consider typical cache sybsystem:
◮ several levels with 64-byte granularity (cache line); ◮ when not found in Level 1, it is copied from Level 2; ◮ when not found in cache, it is copied from Main memory; ◮ when done, the cache line is copied back from Level 1 to Level
2. It is common to think that:
- 1. because Level 1 cache is dedicated to core, it is possible for
two cores to have their own version of a line;
- 2. to make all changed values visible to other threads it is
required to flush core Level 1 cache to at least Level 2 cache. In fact, on modern systems both are false. [4]
Cache coherency
Modern cache subsystems maintain coherence protocol, typically:
◮ only one core may hold a line in write mode, exclusively; ◮ many cores may share a line in read-only mode; ◮ when one requests a line in write mode others must invalidate
the line. With such protocols only one version of a line exists in the subsystem, either
- 1. shared (cloned) among cores in read-only mode
- 2. or exclusive to single core in writeable mode.
This is simplified description, actual cache coherency protocols, like MESIF or MOESI, are more complex.
Happens-before in single thread
In single thread actions execute as if by program order:
◮ statement A happens-before B if program code contains them
in this order
◮ “as if” allows to re-order as long as the result is the same as
with program (sequential) ordering
◮ in extreme, first assignment in x=5; x=y; may be discarded
◮ notice, this affects what values other threads may see
Program ordering does absolutely no guarantees:
◮ what are the relations between actions in different threads, ◮ i.e. what changes and when are seen in other threads.
It is unknown whether and when A effects will be visible in the second thread even if B is visible.
A C B D
Happens-before between threads
On modern cache subsystems we know which particular write to a volatile preceeds a given read from this volatile. JMM states that:
◮ write to a volatile variable happens-before subsequent read of
that variable;
◮ happens-before relation is transitive: A → B and B → C
implies that A → C;
◮ thus, any statements before write to volatile happen-before
statements after the read from the volatile;
◮ including reading/writing other variables
◮ again, compiler and CPU may optimize, but the results must
be as if executed this way. Java compiler and JVM must provide this behavior for our program. This frees from thinking about low-level visibility details of the hardware.
JMM for Visibility2c example
Remember ordering example where we make variable y volatile.
◮ Assume y is read from the second thread just after writer
thread writes value 5;
◮ because y is volatile there is happens-before relation between
these write and read
x=5 yl=y y=5 xl=x
◮ consequently, x=5 happens-before xl=x, i.e. effects of the write
must be visible to the Reader thread
◮ however, x may be 6 or 7, because no (happens-before)
relations are defined for the subsequent writes
◮ they may or may not be visible to the Reader thread
JMM visibility guarantees
JMM provides several other visibility guarantees:
◮ Write to a volatile happens-before subsequent reads of this
volatile;
◮ Release of a lock happens-before subsequent take of this lock; ◮ Call to thread start method happens-before thread run method;
◮ i.e. everything assigned before thread.start() is visible in the
thread!
◮ Actions on a thread happen-before the join in this thread
◮ if we wait for a thread to finish, then its effects are visible
afterwards
◮ Writing final field in constructor happens-before the end of
constructor
◮ implications of this are shown next.
There may be more, refer to JMM specification for details.
Example code
class Holder { int value; Holder(int v) { value = v; } } private static /* volatile */ Holder h = new Holder ( -1);
Writer thread code creates object and initializes to non-zero value:
for (i=0; i<N; i++) { Holder newH = new Holder ((i+1)*2); h = newH; }
Reader thread code calculates how many time the value is zero:
int cnt =0; while (i < N -1) { if (h.value == 0) cnt ++; }
Test results
◮ first case does not define field holder volatile, while second
does h.value = 0 count Machine1 Machine2 Visibility3a 190463, 371652, 26030 15, 657, 323 Visibility3b 0, 0, 0 0, 0, 0 The reason for the results:
◮ Java may inline method and constructor calls; ◮ after that it may re-order inlined operations; ◮ it follows ordering guidelines guaranteed by JMM,
◮ without volatile reference h may be visible before its h.value
A thread may see incompletely created object from another thread, unless it uses volatile or some other form of synchronization!
Double-Checked Locking code
◮ if object is not visible, synchronize and create it ◮ if object is already visible, use it class Foo { private Helper helper; public Helper getHelper () { if (helper == null) { synchronized (this) { if (helper == null) { helper = new Helper (); } } } return helper; } } ◮ As we have seen, in case of improper synchronization the
- bject may be incomplete and cause serious faults!
◮ It may still be ok for integer or other primitive type fields
(except long and double).
Quotes on Double-Checked Locking
It is generally unwise to use double-check for fields containing references to objects or arrays. Visibility of a reference read without synchronization does not guarantee visibility of non-volatile fields accessible from the
- reference. Even if such reference is non-null, fields accessible via the
reference without synchronization may obtain stale values. (see [2] p. 120) The real problem with DCL is the assumption that the worst thing that can happen when reading a shared object reference without synchronization is to erroneously see a stale value (in this case, null); in that case the DCL idiom compensates for this risk by trying again with the lock held. But the worst case is actually considerably worse—it is possible to see a current value of the reference but stale values for the
- bject’s state, meaning that the object could be seen to be in an invalid
- r incorrect state.
Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to work if resource is made volatile, and the performance impact of this is small since volatile reads are usually only slightly more expensive than nonvolatile reads. (see [3] p. 348)
General rules of safe publication
When an object is created and should be shared to other threads:
◮ do not publish while constructor is running; ◮ if it contains only final fields it may be published without
synchronization after the constructor has finished;
◮ otherwise publish with synchronization after the constructor
has finished:
◮ write reference to a volatile variable; ◮ using synchronized on the same object for writer and readers; ◮ using any other lock; ◮ write to a synchronized collection; ◮ use AtomicReference, ...
If you do not follow these rules, your program will most probably work correctly, until your customers will start reporting about crashes and data corruption that are not reproducible!
Conclusions
◮ Inter-thread visibility concerns when changes in one thread are
seen to another
◮ It may be affected by CPU out-of-order execution, cache
subsystem, and compiler re-ordering optmizations
◮ It is hardware and compiler dependent ◮ Java provides Java Memory Model - common rules for all
platforms
◮ it makes sure these rules are correctly matched to underlying
hardware model
◮ Programmer only needs to apply JMM happens-before relation
to his/her program to make sure it is correct
◮ simpler rules like Safe Publication may be used
◮ Visibility issues are most often ignored, because program seems
to work fine
◮ This creates very unlikely race conditions, that shoot during
production usage. They cannot be reproduced or even traced.
Intel 64 and IA-32 Architectures Software Developer’s
- Manual. Feb. 2016. URL: