NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, - PowerPoint PPT Presentation

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 18 November 2016

Lecture 7  Linearizability  Lock-free progress properties  Queues  Reducing contention  Explicit memory management

Linearizability 3

More generally  Suppose we build a shared-memory data structure directly from read/write/CAS, rather than using locking as an intermediate layer Data structure Data structure Locks H/W primitives: read, H/W primitives: read, write, CAS, ... write, CAS, ...  Why might we want to do this?  What does it mean for the data structure to be correct? 4

What we’re building  A set of integers, represented by a sorted linked list  find(int) -> bool  insert(int) -> bool  delete(int) -> bool 5

Searching a sorted list  find(20): 20? 30 H 10 T find(20) -> false 6

Inserting an item with CAS  insert(20): 30  20  30 H 10 T 20 insert(20) -> true 7

Inserting an item with CAS  insert(20): • insert(25): 30  25  30  20  30 H 10 T 20 25 8

Searching and finding together -> false • insert(20)  find(20) -> true ...but this thread 20? This thread saw 20 succeeded in putting was not in the set... it in! 30 H 10 T • Is this a correct implementation of a set? 20 • Should the programmer be surprised if this happens? • What about more complicated mixes of operations? 9

Correctness criteria Informally: Look at the behaviour of the data structure (what operations are called on it, and what their results are). If this behaviour is indistinguishable from atomic calls to a sequential implementation then the concurrent implementation is correct. 10

Sequential specification  Ignore the list for the moment, and focus on the set: 10, 20, 30 Sequential: we’re only Specification: we’re saying what considering one operation a set does, not what a list does, on the set at a time or how it looks in memory insert(15)->true 10, 15, 20, 30 find(int) -> bool insert(int) -> bool delete(20)->true insert(20)->false delete(int) -> bool 10, 15, 30 10, 15, 20, 30 11

System model High-level operation Lookup(20) Insert(15) time H H->10 H H->10 New 10->20 CAS True True Primitive step (read/write/CAS) 12

High level: sequential history • No overlapping invocations: T1: insert(10) T2: insert(20) T1: find(15) -> false -> true -> true time 10 10, 20 10, 20 13

High level: concurrent history • Allow overlapping invocations: insert(10)->true insert(20)->true Thread 1: time Thread 2: find(20)->false 14

Linearizability • Is there a correct sequential history: • Same results as the concurrent one • Consistent with the timing of the invocations/responses? 15

Example: linearizable insert(10)->true insert(20)->true Thread 1: time Thread 2: A valid sequential find(20)->false history: this concurrent execution is OK 16

Example: linearizable insert(10)->true delete(10)->true Thread 1: time Thread 2: A valid sequential find(10)->false history: this concurrent execution is OK 17

Example: not linearizable insert(10)->true insert(10)->false Thread 1: time Thread 2: delete(10)->true 18

Returning to our example • insert(20) -> true • find(20) -> false 20? 30 H 10 T A valid sequential history: 20 this concurrent execution is OK find(20)->false Thread 1: Thread 2: insert(20)->true 19

Recurring technique  For updates:  Perform an essential step of an operation by a single atomic instruction  E.g. CAS to insert an item into a list  This forms a “linearization point”  For reads:  Identify a point during the operation’s execution when the result is valid  Not always a specific instruction 20

Adding “delete”  First attempt: just use CAS delete(10): 10  30  30 H 10 T 21

Delete and insert:  delete(10) & insert(20): 30  20  10  30  30 H 10 T  20 22

Logical vs physical deletion  Use a ‘spare’ bit to indicate logically deleted nodes:     10  30 30  30X 30 H 10 T 30  20  20 23

Delete-greater-than-or-equal deleteany() -> int 10, 20, 30 deleteany()->10 deleteany()->20 20, 30 10, 30 This is still a sequential spec... just not a deterministic one 24

Delete-greater-than-or-equal  DeleteGE(int x) -> int  Remove “x”, or next element above “x” 30 H 10 T • DeleteGE(20) -> 30 H 10 T 25

Does this work: DeleteGE(20) 30 H 10 T 1. Walk down the list, as in a normal delete, find 30 as next-after-20 2. Do the deletion as normal: set the mark bit in 30, then physically unlink 26

Delete-greater-than-or-equal B must be after A (thread order) insert(25)->true insert(30)->false A B Thread 1: time C Thread 2: deleteGE(20)->30 A must be after C C must be after B (otherwise C should (otherwise B should have returned 15) have succeeded) 27

Lock-free progress properties 28

Progress: is this a good “lock - free” list? static volatile int MY_LIST = 0; OK, we’re not calling pthread_mutex_lock... but bool find(int key) { we’re essentially doing the same thing // Wait until list available while (CAS(&MY_LIST, 0, 1) == 1) { } ... // Release list MY_LIST = 0; } 29

“Lock - free”  A specific kind of non-blocking progress guarantee  Precludes the use of typical locks  From libraries  Or “hand rolled”  Often mis-used informally as a synonym for  Free from calls to a locking function  Fast  Scalable 30

“Lock - free”  A specific kind of non-blocking progress guarantee  Precludes the use of typical locks  From libraries  Or “hand rolled”  Often mis-used informally as a synonym for  Free from calls to a locking function  Fast  Scalable The version number mechanism is an example of a technique that is often effective in practice, does not use locks, but is not lock-free in this technical sense 31

Wait-free  A thread finishes its own operation if it continues executing steps Start Start Start time Finish Finish Finish 32

Implementing wait-free algorithms  Important in some significant niches  e.g., in real-time systems with worst-case execution time guarantees  General construction techniques exist (“universal constructions”)  Queuing and helping strategies: everyone ensures oldest operation makes progress  Often a high sequential overhead  Often limited scalability  Fast-path / slow-path constructions  Start out with a faster lock-free algorithm  Switch over to a wait-free algorithm if there is no progress  ...if done carefully, obtain wait-free progress overall  In practice, progress guarantees can vary between operations on a shared object  e.g., wait-free find + lock-free delete 33

Lock-free  Some thread finishes its operation if threads continue taking steps Start Start Start Start time Finish Finish Finish 34

A (poor) lock-free counter int getNext(int *counter) { while (true) { Not wait free: no int result = *counter; guarantee that any if (CAS(counter, result, result+1)) { particular thread will return result; succeed } } } 35

Implementing lock-free algorithms  Ensure that one thread (A) only has to repeat work if some other thread (B) has made “real progress”  e.g., insert(x) starts again if it finds that a conflicting update has occurred  Use helping to let one thread finish another’s work  e.g., physically deleting a node on its behalf 36

Obstruction-free  A thread finishes its own operation if it runs in isolation Start Start time Finish Interference here can prevent any operation finishing 37

A (poor) obstruction-free counter Assuming a very weak int getNext(int *counter) { load-linked (LL) store- while (true) { conditional (SC): LL on int result = LL(counter); one thread will prevent an if (SC(counter, result+1)) { SC on another thread return result; succeeding } } } 38

Building obstruction-free algorithms  Ensure that none of the low-level steps leave a data structure “broken”  On detecting a conflict:  Help the other party finish  Get the other party out of the way  Use contention management to reduce likelihood of live- lock 39

Hashtables and skiplists 40

Hash tables 0 16 24 List of items with hash val modulo 8 == 0 Bucket array: 8 entries in 3 11 example 5 41

Hash tables: Contains(16) 0 16 24 1. Hash 16. Use bucket 0 2. Use normal list operations 3 11 5 42

Hash tables: Delete(11) 0 16 24 3 11 1. Hash 11. Use bucket 3 5 2. Use normal list operations 43

Lessons from this hashtable  Informal correctness argument:  Operations on different buckets don’t conflict: no extra concurrency control needed  Operations appear to occur atomically at the point where the underlying list operation occurs  (Not specific to lock-free lists: could use whole-table lock, or per-list locks, etc.) 44

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, - PowerPoint PPT Presentation

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 18 November 2016 Lecture 7 Linearizability Lock-free progress properties Queues Reducing contention Explicit memory management Linearizability 3

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Pragmatic Primitives for Non-blocking Data Structures PODC 2013 Trevor Brown, University of

Blocking and Non-blocking Checkpointing and Rollback Recovery for Networks-on-Chip Claudia Rusu 1

Dynamic Blocking Problems for Models of Fire Propagation Alberto Bressan Department of

Delay Aware Packet Scheduling (DAPS) and receivers buffer blocking in CMT-SCTP Nicolas KUHN 1 ,

Blocking in the 2 k Design Blocking may be required because: we cannot perform all required runs

Efficient ion blocking in gaseous detectors Efficient ion blocking in gaseous detectors and its

[Introduction to] Writing non- blocking code ... in Node.js and Perl Thursday, July 19, 12

A General Technique for Non-blocking Trees Trevor Brown, University of Toronto, Canada Faith

Non-Blocking Two Phase Commit (2PC) Using Blockchain Paul Ezhilchelvan , Amjad Aldweesh and Aad

Hypo contact and Sasakian SU ( 2 ) -structures in 5-dimensions structures on Lie groups Sasakian

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 25 November 2016 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 27 November 2015 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 21 November 2014 Lecture 7

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 14 November 2014 Lecture 6

Introduction to GPU Computing Jeff Larkin Cray Supercomputing Center of Excellence

High-Level Language VM Outline Introduction Virtualizing conventional ISA Vs. HLL VM

Explicit vs. Implicit Parallel Programming Language, Directive, Library Expose, Express,

Myths and Realities: The Performance Impact of Garbage Collection Presented by: Tapasya Patki

Deep learning 13.1. Attention for Memory and Sequence Translation Fran cois Fleuret

James Silva Lead Dishwasher, Ska Studios Intro Stylistic Action Platformer Sequel to The

Platforms July 28, 2010 Big Data for Science Workshop Judy Qiu xqiu@indiana.edu

Introduction to read alignment pipelines and gene expression estimates Johan Reimegrd Read