Lock-Free, Wait-Free and Multi-core Programming Roger Deran - PowerPoint PPT Presentation

Lock-Free, Wait-Free and Multi-core Programming Roger Deran boilerbay.com Fast, Efficient Concurrent Maps AirMap

Lock-Free and Wait-Free Data Structures  Overview  The Java Maps that use Lock-Free techniques  Graphical performance of Map data structures  Consensus number concept  The Ubiquitous ‘CAS’ primitive  Implementing AtomicInteger using CAS  Implementing Java ConcurrentSkipListMap  Volatile variables – vital but confusing  Memory Barriers  so esoteric, we buy mutually free beer

Lock-Free and Wait-Free Data Structures  For multiple threads sharing data  Fast  Extreme concurrency with many cores active  Extreme performance – no expensive wait queues  Extremely low latency (wait-free)  Constructed from very powerful, simple primitives  Algorithms difficult, so usually use canned ones  Active research on these precious techniques

Lock-Free and Wait-Free Data Structures  Can implement fast locks with wait queues  Mutexes, RW locks, Semaphores, Condition Variables  Can implement fast Atomics  Integers, Longs, Booleans, References  Can implement multi-core data structures  HashMaps or Sets, Tree Maps or Sets, Queues, Lists, Stacks

Lock-Free and Wait-Free Data Structures  Lock-Free  Not fair between threads  Always has a retry loop  Guarantees progress of some thread but not which one  Not a spin lock! Spins can almost stall the whole system  Wait-Free – beats Lock Free  Fair between threads  Every thread is guaranteed to make progress in finite time  Rely on GC for unique ids, can generate much garbage  More difficult in C, C++, boost::lockfree (the ‘ABA’ problem)

The standard Java Map Classes The Concurrent* are Lock-Free and AirMap is Mostly Lock-Free AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

Map Feature Comparison ConcurrentSkipListMap ConcurrentHashMap HashMap TreeMap AirMap put/get/remove l l l l l ordered access l l l thread safe l l l most memory efficient l fastest multicore access l AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

Lock-Free Map Random Cumulative Put Decreasing exponential speed with Map size AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

Lock-Free Map Concurrent Random Put AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

Map Concurrent Random Access Mixed 4 thread put 4 thread get AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

Lock-Free 8-Thread Remove Speed JVM size versus time shows GC efficiency AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

Lock-Free One-Thread Iterator Speed AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

Lock-Free One-Thread Iterator Speed Log scale shows the entire spectrum AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

Map Entry size vs Map size Size of basic Key/Value entry in bytes given log Map size

Consensus Number Any given concurrency primitive has one How many Threads can be synchronized?  Consensus 1: Surprisingly, memory is weak  Atomic read or write to memory. Dekker’s Algorithm  Consensus 2: Another surprise – many are weak  Queues, test-and-set, swap, getAndAdd, stacks  Consensus infinity: A few vital powerful primitives  Augmented queue – like socket poll  Compare And S et “CAS” type instruction  Load-Link and Store-Conditional instruction pair AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

The Ubiquitous CAS Compare and Set Atomic, Infinite consensus number Pseudo-code, normally one instruction: boolean compareAndSet ( ValueType * p, ValueType expectedValue, ValueType newValue ) { … } Java implementation invokes secret native code: class AtomicInteger { public final boolean compareAndSet ( int expect, int update) { return unsafe.compareAndSwapInt(this, valueOffset, expect, update); } … }

The Ubiquitous CAS Compare and Set  Definition: Atomically change a given memory location to a given new value if it has a given expected value, and return true iff the change took place.  Consensus infinity is expensive.  Memory bus is locked for all cores: slow  x86, x64 instruction (with lock prefix byte for SMP): LOCK; CMPXCHG ptr, expected, new  Can implement primitives with lower consensus numbers like AtomicInteger.getAndIncrement() AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

AtomicInteger from Java library source code. lock-free (has retry loop) /** * Atomically increments by one the current value. * * @return the previous value */ public final int getAndIncrement () { for (;;) { int current = get (); int next = current + 1; if ( compareAndSet (current, next)) return current; } } AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

ConcurrentSkipListMap Leaf node structure from Java source code static final class Node<K,V> { final K key; volatile Object value; volatile Node<K,V> next; } AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

ConcurrentSkipListMap from Java source code comments * Here's the sequence of events for a deletion of node n with * predecessor b and successor f, initially: * * +------+ +------+ +------+ * ... | b |------>| n |----->| f | ... * +------+ +------+ +------+ * * 1. CAS n's value field from non-null to null. * From this point on, no public operations encountering * the node consider this mapping to exist. However, other * ongoing insertions and deletions might still modify * n's next pointer. AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

ConcurrentSkipListMap from source code comments * 2. CAS n's next pointer to point to a new marker node. * From this point on, no other nodes can be appended to n. * which avoids deletion errors in CAS-based linked lists. * * +------+ +------+ +------+ +------+ * ... | b |------>| n |----->|marker|------>| f | ... * +------+ +------+ +------+ +------+ * AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

ConcurrentSkipListMap from Java source code comments * 3. CAS b's next pointer over both n and its marker. * From this point on, no new traversals will encounter n, * and it can eventually be GCed. * +------+ +------+ * ... | b |----------------------------------->| f | ... * +------+ +------+ * * A failure at step 1 leads to simple retry due to a lost race * with another operation. Steps 2-3 can fail because some other * thread noticed during a traversal a node with null value and * helped out by marking and/or unlinking. This helping-out * ensures that no thread can become stuck waiting for progress of * the deleting thread. The use of marker nodes slightly * complicates helping-out code because traversals must track * consistent reads of up to four nodes (b, n, marker, f), not * just (b, n, f), although the next field of a marker is * immutable, and once a next field is CAS'ed to point to a * marker, it never again changes, so this requires less care. AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

Volatile Variables V ital, little understood. We consider Java ‘volatile’ here Necessary for inter-thread visibility (also in C#) class MyClass { // only one thread necessarily sees this int i; // vi can be seen by any thread volatile int vi; // Java array elements are not volatile! volatile int[] va = new int[ SIZE ]; // only the reference is volatile volatile ArrayList v al = new ArrayList (); // synchronized loads, stores all variables public synchronized void set (int newI) { i = newI; } … } AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

Volatile Variables Vital, little understood. Some architectures re-order loads/stores to memory!  ‘As if’ no change to the code but slower.  Ensure loads and stores reach memory for inter- thread visibility (except for C,C++ it’s only for I/O)  Locks and synchronized blocks do too, but they are slower and not lock-free.  Not Atomic!  myVolatile++ by two threads may lose a count.  Use AtomicInteger instead.  Generally much faster than CAS, atomics, locks.  Very fast, or free. (on x86, load is free on hardware)  Consensus number 1 AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

Lock-Free, Wait-Free and Multi-core Programming Roger Deran - PowerPoint PPT Presentation

Lock-Free, Wait-Free and Multi-core Programming Roger Deran boilerbay.com Fast, Efficient Concurrent Maps AirMap Lock-Free and Wait-Free Data Structures Overview The Java Maps that use Lock-Free techniques Graphical performance

KRISTA BOAN WAIT, WHAT JUST HAPPENED? WAIT, WHAT JUST HAPPENED? WAIT, WHAT JUST HAPPENED? WAIT,

From Lock-Free to Wait-Free: Linked List Edward Duong Outline 1) Outline operations of the

LOCK/WAIT FREE SYNCHRONIZATION Synchronization Mutex Blocking Lock-free At

1 Reader/Writer Lock: Second Try Reader/Writer Lock: Second Try Guidelines for Condition

Competitive Freshness Algorithms for Wait free Objects Wait-free Objects Peter Damaschke, Phuong

CS 457 Lecture 5 Reliable Delivery Part 2 Fall 2011 Stop and Wait in Action Stop and Wait

Points to ponder while we wait for everyone to log on Points to ponder while we wait for

Concurrency Problems Thierry Sans (recap) Lock A lock is an object in memory providing two atomic

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

Synchronization: Going Deeper Synchronization: Going Deeper SharedLock : Reader/Writer Lock :

Synchronization Monitors and CV CS 416: Operating Systems Design, Spring 2011 Department of

Outline Asynchronous shared memory model Wait-free Consensus in shared memory with R/W

CV Border Wait- -Time Time CV Border Wait Measurement Project Measurement Project Border

Clean Room and Lock System Status report and 1 Bla Majorovits GERDA Collaboration meeting,

Mounting options and installation visualization for: K-Lock Mounting System & Professional

HTTPS Demystified! What is that green lock and why is it so important? What is that green lock

Designing the Next- Designing the Next -Generation of Handheld Devices Generation of Handheld

Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4

Interrupts and Exceptions Todays lecture Use addressing to get data

C ENTER FOR E THICS & P UBLIC S ERVICE ETHICAL IMPLICATIONS FOR THE TRUSTS & ESTATES LAWYER

The Chorus Line Reliable Wind Power Control The Chorus Line, managed by its ChorusDIRECTOR

HiPEAC11 Heraklion - Crete DDM-VM c : The Data-Driven Multithreading Virtual Machine for the

LOW-POWER, HIGH-PERFORMANCE, RECONFIGURABLE PROCESSOR USING SINGLE-FLUX-QUANTUM CIRCUITS Naofumi

Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari

Lock-Free, Wait-Free and Multi-core Programming Roger Deran - PowerPoint PPT Presentation

Lock-Free, Wait-Free and Multi-core Programming Roger Deran boilerbay.com Fast, Efficient Concurrent Maps AirMap Lock-Free and Wait-Free Data Structures Overview The Java Maps that use Lock-Free techniques Graphical performance

KRISTA BOAN WAIT, WHAT JUST HAPPENED? WAIT, WHAT JUST HAPPENED? WAIT, WHAT JUST HAPPENED? WAIT,

From Lock-Free to Wait-Free: Linked List Edward Duong Outline 1) Outline operations of the

LOCK/WAIT FREE SYNCHRONIZATION Synchronization Mutex Blocking Lock-free At

1 Reader/Writer Lock: Second Try Reader/Writer Lock: Second Try Guidelines for Condition

Competitive Freshness Algorithms for Wait free Objects Wait-free Objects Peter Damaschke, Phuong

CS 457 Lecture 5 Reliable Delivery Part 2 Fall 2011 Stop and Wait in Action Stop and Wait

Points to ponder while we wait for everyone to log on Points to ponder while we wait for

Concurrency Problems Thierry Sans (recap) Lock A lock is an object in memory providing two atomic

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

Synchronization: Going Deeper Synchronization: Going Deeper SharedLock : Reader/Writer Lock :

Synchronization Monitors and CV CS 416: Operating Systems Design, Spring 2011 Department of

Outline Asynchronous shared memory model Wait-free Consensus in shared memory with R/W

CV Border Wait- -Time Time CV Border Wait Measurement Project Measurement Project Border

Clean Room and Lock System Status report and 1 Bla Majorovits GERDA Collaboration meeting,

Mounting options and installation visualization for: K-Lock Mounting System &amp; Professional

HTTPS Demystified! What is that green lock and why is it so important? What is that green lock

Designing the Next- Designing the Next -Generation of Handheld Devices Generation of Handheld

Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4

Interrupts and Exceptions Todays lecture Use addressing to get data

C ENTER FOR E THICS &amp; P UBLIC S ERVICE ETHICAL IMPLICATIONS FOR THE TRUSTS &amp; ESTATES LAWYER

The Chorus Line Reliable Wind Power Control The Chorus Line, managed by its ChorusDIRECTOR

HiPEAC11 Heraklion - Crete DDM-VM c : The Data-Driven Multithreading Virtual Machine for the

LOW-POWER, HIGH-PERFORMANCE, RECONFIGURABLE PROCESSOR USING SINGLE-FLUX-QUANTUM CIRCUITS Naofumi

Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari

Mounting options and installation visualization for: K-Lock Mounting System & Professional

C ENTER FOR E THICS & P UBLIC S ERVICE ETHICAL IMPLICATIONS FOR THE TRUSTS & ESTATES LAWYER