CPL 2016, week 1 Java threads and inter-thread visibility Oleg - - PowerPoint PPT Presentation

cpl 2016 week 1
SMART_READER_LITE
LIVE PREVIEW

CPL 2016, week 1 Java threads and inter-thread visibility Oleg - - PowerPoint PPT Presentation

CPL 2016, week 1 Java threads and inter-thread visibility Oleg Batrashev Institute of Computer Science, Tartu, Estonia February 10, 2016 Course information 1 lecture, 1 lab each week moving lecture/lab time? (Monday 10-16) Languages


slide-1
SLIDE 1

CPL 2016, week 1

Java threads and inter-thread visibility Oleg Batrashev

Institute of Computer Science, Tartu, Estonia

February 10, 2016

slide-2
SLIDE 2

Course information

◮ 1 lecture, 1 lab each week

◮ moving lecture/lab time? (Monday 10-16)

◮ Languages (poll)

◮ Java 40% ◮ Erlang 20% ◮ Clojure 40%

◮ Total score consists of

◮ weekly homeworks 20% in total ◮ two small projects, 20% each ◮ 40% written exam

◮ I create lecture notes where add comments to slides

slide-3
SLIDE 3

Agenda

Java threads Creating threads Measuring time Inter-thread visibility Example: infinite loop Hardware: invalid read/write order Example: read/write access ordering Hardware: cache subsystem Java memory model Example: faulty publication Double-Checked Locking Object safe publication Conclusions

slide-4
SLIDE 4

Extend Thread class

  • 1. Extend your class from the Thread class,
  • 2. Create an object of your class,
  • 3. Start your thread by using start() method on the object.

This may be the simplest method, but it is not very flexible. Using Runnable interface, as shown next is preferred way.

◮ How many of you know this?

slide-5
SLIDE 5

Implement Runnable interface

Create a class:

class Reader implements Runnable { @Override public void run () { // new thread starts execution here } }

Create thread object and pass it new class object:

Thread t = new Thread(new Reader ()); t.start ();

slide-6
SLIDE 6

System.nanoTime()

◮ measures wall-time (elapsed time)

◮ as opposed to user time or CPU time

◮ valid only within the same JVM ◮ much better accuracy than currentTimeInMillis()

Example:

long mainStartTime = System.nanoTime (); while (i<N) i++; t.join (); System.out.println("Time difference " +( readerStartTime

  • mainStartTime ));

◮ join() waits for the other thread to finish, which makes its

values available, e.g. readerStartTime

slide-7
SLIDE 7

CountDownLatch

CountDownLatch may be used to synchronize threads[3, sec 5.5.1]

latch = new CountDownLatch (1); Thread t = new Thread(new Reader ()); t.start (); latch.await (); ◮ await() method suspends until latch is zero ◮ countDown() method decreases the latch value public void run () { latch.countDown (); try { latch.await (); } catch ( InterruptedException e) { throw new RuntimeException (e); } readerStartTime = System.nanoTime ();

slide-8
SLIDE 8

Visibility problems

Single-threaded programs

◮ always see the last written value of a variable x=5 ... y=2*x

Multi-threaded programs

◮ can see variable values written from other threads, ◮ moreover, there is uncertainty in the order the changes are

seen in the current thread. There are several reasons why such situation can happen:

  • 1. CPU registers
  • 2. CPU instructions executing out-of-order
  • 3. Compiler re-ordering optimizations
slide-9
SLIDE 9

Example code

Variable i is visible to the Main and Reader threads:

private static final int N = 50000000; private static /* volatile */ int i = 0;

Main thread writes the variable in the loop:

Thread t = new Thread(new Reader ()); t.start (); while (i<N) i++; System.out.println("Main finished");

Reader thread reads it in a loop and exits:

int cnt =0; while (i<N) { cnt ++; } System.out.println("Count "+cnt );

slide-10
SLIDE 10

Memory hierarchy

The hierarchy consists of the following layers:

◮ Registers inside the core (CPU) – contain the values of

instructions being executed;

◮ Dedicated cache (Level 1) – each core has its own; ◮ Shared cache (e.g. Level 2) – common for all cores; ◮ Main memory (RAM) – just your memory.

The following are typical sizes and access delays: Registers Level 1 cache

< 1ns

Level 2 cache

1 − 2ns several KBytes

Main memory (RAM)

5 − 15ns 30-100 KBytes

Core

100 − 300ns several MBytes several GBytes

slide-11
SLIDE 11

Memory read/write ordering

When running machine code, CPU (core) may

◮ execute it out-of-order, thus write/read

memory in different order. The latter is mostly eliminated by load/store buffers:

  • 1. all reads from cache are program ordered;
  • 2. all write to cache are program ordered;
  • 3. reads may be re-ordered before writes to

different memory location;

  • 4. there are exceptions as described in [1]

(vol. 3, sec. 8.2 “Memory ordering”);

  • 5. memory fence instruction[5] must be used

to disallow moving reads before writes. It is complex and platform dependent! Programmer needs something simpler to grasp!

CPU Registers

Store buffer Load buffer

Cache Memory

slide-12
SLIDE 12

Example code

private static volatile boolean isRunning = true; private static /* volatile

  • nly in b case */ int x = 0;

private static /* volatile

  • nly in c case */ int y = 0;

Writer thread writes x and y in order to increasing values:

for (int i=0; i<N; i++) { x = i; y = i; } isRunning = false;

Reader thread reads x and y in the opposite order:

int xl , yl , cnt =0; do { yl = y; xl = x; if (xl < yl) cnt ++; } while (isRunning );

cnt tells how many times x < y.

slide-13
SLIDE 13

Example analysis and results

If operations are not re-ordered:

◮ Writer thread always x first, so in memory x ≥ y ◮ Reader thread may get variables from different iterations of

the Writer thread

◮ y is read first, thus it must be from earlier iteration, thus x > y

It must therefore never be that x < y. Test results are: x < y count (several runs) Machine1 Machine2 Visibility2a 984, 7487, 1179 2781, 37, 182 Visibility2b 286, 21307, 1015 80975, 255, 80330 Visibility2c 0, 0, 0 0, 0, 0 For tests a and b most probable explanation:

◮ compiler re-ordering optimizations caused different read

and/or write order Reasoning about program behavior of two threads is difficult! Unless there are certain guarantees, e.g. Java Memory Model.

slide-14
SLIDE 14

Ordering semantics of volatile

In Java program:

◮ writes to a volatile variable may not be re-ordered with earlier

writes (possibly to different variables)

◮ reads from a volatile variable may not be re-ordered with the

latter reads (possibly from different variables) In our example:

x = i; y = i; ◮ if y is volatile, it may not be ordered before write to x; ◮ if x is volatile, no guarantees are given, i.e. write to y may be

done before write to x.

slide-15
SLIDE 15

Common misconceptions

Consider typical cache sybsystem:

◮ several levels with 64-byte granularity (cache line); ◮ when not found in Level 1, it is copied from Level 2; ◮ when not found in cache, it is copied from Main memory; ◮ when done, the cache line is copied back from Level 1 to Level

2. It is common to think that:

  • 1. because Level 1 cache is dedicated to core, it is possible for

two cores to have their own version of a line;

  • 2. to make all changed values visible to other threads it is

required to flush core Level 1 cache to at least Level 2 cache. In fact, on modern systems both are false. [4]

slide-16
SLIDE 16

Cache coherency

Modern cache subsystems maintain coherence protocol, typically:

◮ only one core may hold a line in write mode, exclusively; ◮ many cores may share a line in read-only mode; ◮ when one requests a line in write mode others must invalidate

the line. With such protocols only one version of a line exists in the subsystem, either

  • 1. shared (cloned) among cores in read-only mode
  • 2. or exclusive to single core in writeable mode.

This is simplified description, actual cache coherency protocols, like MESIF or MOESI, are more complex.

slide-17
SLIDE 17

Happens-before in single thread

In single thread actions execute as if by program order:

◮ statement A happens-before B if program code contains them

in this order

◮ “as if” allows to re-order as long as the result is the same as

with program (sequential) ordering

◮ in extreme, first assignment in x=5; x=y; may be discarded

◮ notice, this affects what values other threads may see

Program ordering does absolutely no guarantees:

◮ what are the relations between actions in different threads, ◮ i.e. what changes and when are seen in other threads.

It is unknown whether and when A effects will be visible in the second thread even if B is visible.

A C B D

slide-18
SLIDE 18

Happens-before between threads

On modern cache subsystems we know which particular write to a volatile preceeds a given read from this volatile. JMM states that:

◮ write to a volatile variable happens-before subsequent read of

that variable;

◮ happens-before relation is transitive: A → B and B → C

implies that A → C;

◮ thus, any statements before write to volatile happen-before

statements after the read from the volatile;

◮ including reading/writing other variables

◮ again, compiler and CPU may optimize, but the results must

be as if executed this way. Java compiler and JVM must provide this behavior for our program. This frees from thinking about low-level visibility details of the hardware.

slide-19
SLIDE 19

JMM for Visibility2c example

Remember ordering example where we make variable y volatile.

◮ Assume y is read from the second thread just after writer

thread writes value 5;

◮ because y is volatile there is happens-before relation between

these write and read

x=5 yl=y y=5 xl=x

◮ consequently, x=5 happens-before xl=x, i.e. effects of the write

must be visible to the Reader thread

◮ however, x may be 6 or 7, because no (happens-before)

relations are defined for the subsequent writes

◮ they may or may not be visible to the Reader thread

slide-20
SLIDE 20

JMM visibility guarantees

JMM provides several other visibility guarantees:

◮ Write to a volatile happens-before subsequent reads of this

volatile;

◮ Release of a lock happens-before subsequent take of this lock; ◮ Call to thread start method happens-before thread run method;

◮ i.e. everything assigned before thread.start() is visible in the

thread!

◮ Actions on a thread happen-before the join in this thread

◮ if we wait for a thread to finish, then its effects are visible

afterwards

◮ Writing final field in constructor happens-before the end of

constructor

◮ implications of this are shown next.

There may be more, refer to JMM specification for details.

slide-21
SLIDE 21

Example code

class Holder { int value; Holder(int v) { value = v; } } private static /* volatile */ Holder h = new Holder ( -1);

Writer thread code creates object and initializes to non-zero value:

for (i=0; i<N; i++) { Holder newH = new Holder ((i+1)*2); h = newH; }

Reader thread code calculates how many time the value is zero:

int cnt =0; while (i < N -1) { if (h.value == 0) cnt ++; }

slide-22
SLIDE 22

Test results

◮ first case does not define field holder volatile, while second

does h.value = 0 count Machine1 Machine2 Visibility3a 190463, 371652, 26030 15, 657, 323 Visibility3b 0, 0, 0 0, 0, 0 The reason for the results:

◮ Java may inline method and constructor calls; ◮ after that it may re-order inlined operations; ◮ it follows ordering guidelines guaranteed by JMM,

◮ without volatile reference h may be visible before its h.value

A thread may see incompletely created object from another thread, unless it uses volatile or some other form of synchronization!

slide-23
SLIDE 23

Double-Checked Locking code

◮ if object is not visible, synchronize and create it ◮ if object is already visible, use it class Foo { private Helper helper; public Helper getHelper () { if (helper == null) { synchronized (this) { if (helper == null) { helper = new Helper (); } } } return helper; } } ◮ As we have seen, in case of improper synchronization the

  • bject may be incomplete and cause serious faults!

◮ It may still be ok for integer or other primitive type fields

(except long and double).

slide-24
SLIDE 24

Quotes on Double-Checked Locking

It is generally unwise to use double-check for fields containing references to objects or arrays. Visibility of a reference read without synchronization does not guarantee visibility of non-volatile fields accessible from the

  • reference. Even if such reference is non-null, fields accessible via the

reference without synchronization may obtain stale values. (see [2] p. 120) The real problem with DCL is the assumption that the worst thing that can happen when reading a shared object reference without synchronization is to erroneously see a stale value (in this case, null); in that case the DCL idiom compensates for this risk by trying again with the lock held. But the worst case is actually considerably worse—it is possible to see a current value of the reference but stale values for the

  • bject’s state, meaning that the object could be seen to be in an invalid
  • r incorrect state.

Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to work if resource is made volatile, and the performance impact of this is small since volatile reads are usually only slightly more expensive than nonvolatile reads. (see [3] p. 348)

slide-25
SLIDE 25

General rules of safe publication

When an object is created and should be shared to other threads:

◮ do not publish while constructor is running; ◮ if it contains only final fields it may be published without

synchronization after the constructor has finished;

◮ otherwise publish with synchronization after the constructor

has finished:

◮ write reference to a volatile variable; ◮ using synchronized on the same object for writer and readers; ◮ using any other lock; ◮ write to a synchronized collection; ◮ use AtomicReference, ...

If you do not follow these rules, your program will most probably work correctly, until your customers will start reporting about crashes and data corruption that are not reproducible!

slide-26
SLIDE 26

Conclusions

◮ Inter-thread visibility concerns when changes in one thread are

seen to another

◮ It may be affected by CPU out-of-order execution, cache

subsystem, and compiler re-ordering optmizations

◮ It is hardware and compiler dependent ◮ Java provides Java Memory Model - common rules for all

platforms

◮ it makes sure these rules are correctly matched to underlying

hardware model

◮ Programmer only needs to apply JMM happens-before relation

to his/her program to make sure it is correct

◮ simpler rules like Safe Publication may be used

◮ Visibility issues are most often ignored, because program seems

to work fine

◮ This creates very unlikely race conditions, that shoot during

production usage. They cannot be reproduced or even traced.

slide-27
SLIDE 27

Intel 64 and IA-32 Architectures Software Developer’s

  • Manual. Feb. 2016. URL:

http://www.intel.com/content/www/us/en/processors/ architectures-software-developer-manuals.html. Doug Lea. Concurrent Programming in Java. Second Edition: Design Principles and Patterns. 2nd. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1999. ISBN: 0201310090. Tim Peierls et al. Java Concurrency in Practice. Addison-Wesley Professional, 2005. ISBN: 0321349601. Martin Thompson. CPU Cache Flushing Fallacy. Feb. 2013. URL: http://mechanical- sympathy.blogspot.com.ee/2013/02/cpu-cache- flushing-fallacy.html. Wikipedia: Memory barrier. Feb. 2016. URL: https://en.wikipedia.org/wiki/Memory_barrier.