lock free wait free and
play

Lock-Free, Wait-Free and Multi-core Programming Roger Deran - PowerPoint PPT Presentation

Lock-Free, Wait-Free and Multi-core Programming Roger Deran boilerbay.com Fast, Efficient Concurrent Maps AirMap Lock-Free and Wait-Free Data Structures Overview The Java Maps that use Lock-Free techniques Graphical performance


  1. Lock-Free, Wait-Free and Multi-core Programming Roger Deran boilerbay.com Fast, Efficient Concurrent Maps AirMap

  2. Lock-Free and Wait-Free Data Structures  Overview  The Java Maps that use Lock-Free techniques  Graphical performance of Map data structures  Consensus number concept  The Ubiquitous ‘CAS’ primitive  Implementing AtomicInteger using CAS  Implementing Java ConcurrentSkipListMap  Volatile variables – vital but confusing  Memory Barriers  so esoteric, we buy mutually free beer

  3. Lock-Free and Wait-Free Data Structures  For multiple threads sharing data  Fast  Extreme concurrency with many cores active  Extreme performance – no expensive wait queues  Extremely low latency (wait-free)  Constructed from very powerful, simple primitives  Algorithms difficult, so usually use canned ones  Active research on these precious techniques

  4. Lock-Free and Wait-Free Data Structures  Can implement fast locks with wait queues  Mutexes, RW locks, Semaphores, Condition Variables  Can implement fast Atomics  Integers, Longs, Booleans, References  Can implement multi-core data structures  HashMaps or Sets, Tree Maps or Sets, Queues, Lists, Stacks

  5. Lock-Free and Wait-Free Data Structures  Lock-Free  Not fair between threads  Always has a retry loop  Guarantees progress of some thread but not which one  Not a spin lock! Spins can almost stall the whole system  Wait-Free – beats Lock Free  Fair between threads  Every thread is guaranteed to make progress in finite time  Rely on GC for unique ids, can generate much garbage  More difficult in C, C++, boost::lockfree (the ‘ABA’ problem)

  6. The standard Java Map Classes The Concurrent* are Lock-Free and AirMap is Mostly Lock-Free AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  7. Map Feature Comparison ConcurrentSkipListMap ConcurrentHashMap HashMap TreeMap AirMap put/get/remove l l l l l ordered access l l l thread safe l l l most memory efficient l fastest multicore access l AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  8. Lock-Free Map Random Cumulative Put Decreasing exponential speed with Map size AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  9. Lock-Free Map Concurrent Random Put AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  10. Map Concurrent Random Access Mixed 4 thread put 4 thread get AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  11. Lock-Free 8-Thread Remove Speed JVM size versus time shows GC efficiency AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  12. Lock-Free One-Thread Iterator Speed AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  13. Lock-Free One-Thread Iterator Speed Log scale shows the entire spectrum AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  14. Map Entry size vs Map size Size of basic Key/Value entry in bytes given log Map size

  15. Consensus Number Any given concurrency primitive has one How many Threads can be synchronized?  Consensus 1: Surprisingly, memory is weak  Atomic read or write to memory. Dekker’s Algorithm  Consensus 2: Another surprise – many are weak  Queues, test-and-set, swap, getAndAdd, stacks  Consensus infinity: A few vital powerful primitives  Augmented queue – like socket poll  Compare And S et “CAS” type instruction  Load-Link and Store-Conditional instruction pair AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  16. The Ubiquitous CAS Compare and Set Atomic, Infinite consensus number Pseudo-code, normally one instruction: boolean compareAndSet ( ValueType * p, ValueType expectedValue, ValueType newValue ) { … } Java implementation invokes secret native code: class AtomicInteger { public final boolean compareAndSet ( int expect, int update) { return unsafe.compareAndSwapInt(this, valueOffset, expect, update); } … }

  17. The Ubiquitous CAS Compare and Set  Definition: Atomically change a given memory location to a given new value if it has a given expected value, and return true iff the change took place.  Consensus infinity is expensive.  Memory bus is locked for all cores: slow  x86, x64 instruction (with lock prefix byte for SMP): LOCK; CMPXCHG ptr, expected, new  Can implement primitives with lower consensus numbers like AtomicInteger.getAndIncrement() AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  18. AtomicInteger from Java library source code. lock-free (has retry loop) /** * Atomically increments by one the current value. * * @return the previous value */ public final int getAndIncrement () { for (;;) { int current = get (); int next = current + 1; if ( compareAndSet (current, next)) return current; } } AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  19. ConcurrentSkipListMap Leaf node structure from Java source code static final class Node<K,V> { final K key; volatile Object value; volatile Node<K,V> next; } AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  20. ConcurrentSkipListMap from Java source code comments * Here's the sequence of events for a deletion of node n with * predecessor b and successor f, initially: * * +------+ +------+ +------+ * ... | b |------>| n |----->| f | ... * +------+ +------+ +------+ * * 1. CAS n's value field from non-null to null. * From this point on, no public operations encountering * the node consider this mapping to exist. However, other * ongoing insertions and deletions might still modify * n's next pointer. AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  21. ConcurrentSkipListMap from source code comments * 2. CAS n's next pointer to point to a new marker node. * From this point on, no other nodes can be appended to n. * which avoids deletion errors in CAS-based linked lists. * * +------+ +------+ +------+ +------+ * ... | b |------>| n |----->|marker|------>| f | ... * +------+ +------+ +------+ +------+ * AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  22. ConcurrentSkipListMap from Java source code comments * 3. CAS b's next pointer over both n and its marker. * From this point on, no new traversals will encounter n, * and it can eventually be GCed. * +------+ +------+ * ... | b |----------------------------------->| f | ... * +------+ +------+ * * A failure at step 1 leads to simple retry due to a lost race * with another operation. Steps 2-3 can fail because some other * thread noticed during a traversal a node with null value and * helped out by marking and/or unlinking. This helping-out * ensures that no thread can become stuck waiting for progress of * the deleting thread. The use of marker nodes slightly * complicates helping-out code because traversals must track * consistent reads of up to four nodes (b, n, marker, f), not * just (b, n, f), although the next field of a marker is * immutable, and once a next field is CAS'ed to point to a * marker, it never again changes, so this requires less care. AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  23. Volatile Variables V ital, little understood. We consider Java ‘volatile’ here Necessary for inter-thread visibility (also in C#) class MyClass { // only one thread necessarily sees this int i; // vi can be seen by any thread volatile int vi; // Java array elements are not volatile! volatile int[] va = new int[ SIZE ]; // only the reference is volatile volatile ArrayList v al = new ArrayList (); // synchronized loads, stores all variables public synchronized void set (int newI) { i = newI; } … } AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

  24. Volatile Variables Vital, little understood. Some architectures re-order loads/stores to memory!  ‘As if’ no change to the code but slower.  Ensure loads and stores reach memory for inter- thread visibility (except for C,C++ it’s only for I/O)  Locks and synchronized blocks do too, but they are slower and not lock-free.  Not Atomic!  myVolatile++ by two threads may lose a count.  Use AtomicInteger instead.  Generally much faster than CAS, atomics, locks.  Very fast, or free. (on x86, load is free on hardware)  Consensus number 1 AirMap is a 90% faster 50% more capacity 7x faster Iteration Multi-core ConcurrentNavigableMap from boilerbay.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend