What consistency guarantees should concurrent data structure libraries provide? Hans-J. Boehm hboehm@google.com
Disclaimer This is half-baked and hand-wavy. Mostly questions, few answers. Reflects WG21/SG1 (C++ Concurrency) discussion. I may well be misrepresenting some other work. If so, please correct me!
The problem C++ (among others) would like to add more concurrent data structure libraries. e.g. a concurrent queue. We have to specify what correctness properties they have to satisfy. Serializability of operations on the data structure is ● Slightly too strong: construction and destructions isn’t allowed to race. ● Usually too weak: Operations over multiple data structures don’t interact correctly. Linearizability doesn’t address all the issues.
Data structure interaction: Problem 1 Thread 1: Thread 2: x.add(1); // 2nd if (!y.is_empty()) // 2nd y.add(1); // 1st assert(!x.is_empty()); // 1st Operations on x and y can be individually serializable, and the assertion can still fail. E.g. if x and y are represented by a single atomic and use memory_order_relaxed operations. That’s why we have linearizability ...
Interaction with ordinary data accesses: Problem 2 Thread 1: Thread 2: if (!y.is_empty()) // 2nd x = 1; y.add(1); // 1st assert(x == 1); // 1st Can the accesses on x race? Operations on y can be individually serializable, but fail to ensure visibility for other memory accesses. Particularly important for e.g. threads communicating via a concurrent queue or work-stealing queue. Some careful academic papers (e.g. Batty, Dodds, Gotsman, POPL13), but often ignored.
So what’s hard here? 1) Linearizability definition relies on interleaving-based concurrency. ○ Doesn’t reflect modern mainstream memory models. ○ Meshes particularly badly with non-multi-copy-atomic architectures (Power, GPUs)? ○ In C++, there is no sequential execution history corresponding to a parallel execution. 2) No clear consensus about the right answers, particularly for the second problem. 3) Java typically guarantees that a data structure write “synchronizes with” a reader that sees the write. (Doug Lea’s approach) ○ Can be too strong or too weak. This is an obstacle for concurrent data structures in C++.
Some examples
Queues (e.g. wg21.link/P0260 , mostly Lawrence Crowl’s work) Option 1: Doug Lea’s Java approach. Problem: Doesn’t mesh with “sequential consistency by default” philosophy. Notably (from wg21.link/P0387 , cf. 2+2W litmus test): Thread 1: Thread 2: q.push(1); log.push("pushed 2"); log.push("pushed 1"); q.push(2); log and q are both queues. log may contain “pushed 1”; “pushed 2” while q contains 2; 1 Good enough? Does it matter in practice?
Queues contd: Option 2: Treat data structures like atomic values: ● All library data structure operations and atomic operations appear to execute in a single total order. ● Writers synchronize with readers that see the result. Note: still doesn’t prevent reordering of the assignments in x =rlx 1; q.push(...); y =rlx 1; Implicitly guaranteed by lock-based implementations. Is it fast enough? Does it affect implementations?
Counters ( wg21.link/P0261 , also primarily L. Crowl is a much more elaborate proposal) Statistics counters: read only at end Don’t need memory ordering guarantees. If increments don’t return value, memory ordering is not observable. Statistics counters: concurrently read values readable for printing etc. Probably still don’t need memory ordering guarantees? Counter used as queue index Need acquire/release ordering. Counter (ab)used to implement a lock Need full sequential consistency
Possible approaches C++ atomics library allows each operation to specify memory ordering guarantee. ● Very flexible. ● Interactions turned out to be subtle. (acq/rel vs SC) ● Does this make sense for general data structures? E.g. if you needed sequential consistency everywhere, might you use a lock instead? Does it make sense to templatize the data structure w.r.t. memory_order? For counters, these distinctions matter. For queues are the cost differences too small to worry?
Likely WG21 approach (May be my wishful thinking.) Where feasible, try the strong approach in a “Technical Specification”: ● Guarantees are initially modelled on SC atomics. ● Add relaxed versions with relaxed memory ordering (as well as possibly other semantic relaxations) if performance issues arise. Otherwise try to hide weak ordering behind nondeterminism.
Questions / Discussion?
Recommend
More recommend