atomic h weapons
play

<atomic.h> weapons Paolo Bonzini Red Hat, Inc. KVM Forum - PowerPoint PPT Presentation

<atomic.h> weapons Paolo Bonzini Red Hat, Inc. KVM Forum 2016 The real things Herb Sutters talks atomic<> Weapons: The C++ Memory Model and Modern Hardware Lock-Free Programming (or, Juggling Razor Blades) The


  1. <atomic.h> weapons Paolo Bonzini Red Hat, Inc. KVM Forum 2016

  2. The real things ● Herb Sutter’s talks ● atomic<> Weapons: The C++ Memory Model and Modern Hardware ● Lock-Free Programming (or, Juggling Razor Blades) ● The C11 and C++11 standards ● N2429: Concurrency memory model ● N2480: A Less Formal Explanation of the Proposed C++ Concurrency Memory Model Paolo Bonzini – KVM Forum 2016

  3. Outline ● Who ordered atomics? ● Compilers and the need for a memory model ● qemu/atomic.h : portable atomics in QEMU ● Future work Paolo Bonzini – KVM Forum 2016

  4. Outline ● Who ordered atomics? ● Compilers and the need for a memory model ● qemu/atomic.h : portable atomics in QEMU ● Future work Paolo Bonzini – KVM Forum 2016

  5. Why atomics? ● Coarse locks are simple, but scale badly ● Finer-grained locks introduce problems too ● Not easily composable (“leaf” locks are fine, nesting can result in deadlocks) ● Taking a lock many times is slow ● Like extremely fine-grained locks, but faster Paolo Bonzini – KVM Forum 2016

  6. What do atomics provide? ● Ordering of reads and writes ● Atomic compare-and-swap, like this: atomic_cmpxchg( T *p, T expected, T desired) { old = *p; if (*p == expected) *p = desired; return old; } ● Everything else can be built on top of these Paolo Bonzini – KVM Forum 2016

  7. When to use atomics? ● When threads communicate at well-defined points ● Example: ring buffers ● When consistency requirements are minimal ● Example: accumulating statistics ● When complexity is easily abstracted ● Example: synchronization primitives, data structures ● For the fast path only ● Example: RCU, seqlock, pthread_once Paolo Bonzini – KVM Forum 2016

  8. Outline ● Who ordered atomics? ● Compilers and the need for a memory model ● qemu/atomic.h : portable atomics in QEMU ● Future work Paolo Bonzini – KVM Forum 2016

  9. Compiler writers are your friends int i; char *a; movb $1, 4(%rsi,%rdi) a[i+4] = 1; int n, *a; int n, *a; for (int i = 0; i <= n; i++) for (int *end = &a[n]; a <= end; ) a[i] = 0; *a++ = 0; int **a; int **a; for (int i = 0; i < M; i++) for (int i = 0; i < M; i++) for (int j = 0; j < N; j++) for (int *row = a[i], j = 0; j < N; j++) a[i][j] = 42; row[j] = 42; Paolo Bonzini – KVM Forum 2016

  10. Compiler writers are your friends (but they need some help too) assumes no overflow in i+4! int i; char *a; movb $1, 4(%rsi,%rdi) a[i+4] = 1; infinite loop if n == INT_MAX? int n, *a; int n, *a; for (int i = 0; i <= n; i++) for (int *end = &a[n]; a <= end; ) a[i] = 0; *a++ = 0; int **a; int **a; for (int i = 0; i < M; i++) for (int i = 0; i < M; i++) for (int j = 0; j < N; j++) for (int *row = a[i], j = 0; j < N; j++) a[i][j] = 42; row[j] = 42; what if a[i][j] overwrites a[i]? Paolo Bonzini – KVM Forum 2016

  11. The hard truth about undefined behavior ● You don’t want the compiler to execute the program you wrote ● Most undefined behavior is obvious ● Some undefined behavior makes sense, but is hard to reason about ● Some undefined behavior seems to make no sense, but really should be left undefined Paolo Bonzini – KVM Forum 2016

  12. Sequential consistency (Lamport, 1979) ● The result of any execution is the same as if reads and writes occurred in some total order ● Operations from each individual processor are ordered the same as they appear in the program static int a; static int a; int x = ++a; f(); f(); return x; return ++a; Paolo Bonzini – KVM Forum 2016

  13. Sequential consistency (Lamport, 1979) ● The result of any execution is the same as if reads and writes occurred in some total order ● Operations from each individual processor are ordered the same as they appear in the program long long x = 0; // thread 1 // thread 2 x = -1; printf(“%lld”, x); Paolo Bonzini – KVM Forum 2016

  14. Sequential consistency (Lamport, 1979) ● The result of any execution is the same as if reads and writes occurred in some total order ● Operations from each individual processor are ordered the same as they appear in the program Paolo Bonzini – KVM Forum 2016

  15. The C/C++ approach ● You also don’t want the processor to execute the program that you wrote ● Processor “optimizations” can be described by rearranging loads and stores in the source code ● Can the same tools let you reason on both compiler- and processor-level transformations? ● Union, pointers, casts: with great power comes great responsibility Paolo Bonzini – KVM Forum 2016

  16. The C/C++ approach ● Programs must be race-free ● The standard precisely defines data races ● The semantics of data races are left undefined ● If the program is “compiler-correct”, it’s also “processor-correct” ● If the program is correct, its executions are all sequentially consistent ● … unless you turn on the guru switch Paolo Bonzini – KVM Forum 2016

  17. Happens-before (Lamport, 1978) ● Captures causal dependencies between events ● For any two events e1 and e2, only one is true: ● e1 → e2 (e1 happens before e2) ● e2 → e1 (e2 happens before e1) ● e1 || e2 (e1 is concurrent with e2) ● Data race: Concurrent accesses to the same memory location, at least one a write, at least one non-atomic Paolo Bonzini – KVM Forum 2016

  18. More precisely... ● If a thread’s “load-acquire” sees a “store-release” from another thread, the store synchronizes with the load ▶ The store then happens before the load ● Within a single thread, program order provides the happens-before relation ● Happens-before is transitive ▶ Everything before the store-release happens before everything after the load-acquire Paolo Bonzini – KVM Forum 2016

  19. Example: data-race free, correct happens-before foo->a = 1; atomic_store_release(&x, foo); happens-before bar = atomic_load_acquire(&x); return foo->a; happens-before ● No concurrent accesses ● No data race! Paolo Bonzini – KVM Forum 2016

  20. Example: data-race, undefined behavior (I) happens-before foo->a = 1; x = foo; concurrent bar = x; return foo->a; happens-before ● Concurrent non-atomic accesses, one a write ● Data race → undefined behavior! Paolo Bonzini – KVM Forum 2016

  21. Example: data-race, undefined behavior (II) happens-before foo->a = 1; atomic_store_relaxed(&x, foo); concurrent bar = atomic_load_relaxed(&x); return foo->a; happens-before ● Concurrent non-atomic accesses, one a write ● Concurrent atomic accesses, one a write ● Data race → undefined behavior! ● No data race! Paolo Bonzini – KVM Forum 2016

  22. Example: relaxed, data-race free atomic_inc(&bs->nr_reads); concurrent stats->reads = atomic_read(&bs->nr_reads); ● Concurrent atomic accesses, one a write ● No data race! But not sequentially consistent Paolo Bonzini – KVM Forum 2016

  23. Acquire/release as optimization barriers happens-before foo->a = 1; ▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲ atomic_store_release(&x, foo); happens-before bar = atomic_load_acquire(&x); ▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲ return foo->a; happens-before Paolo Bonzini – KVM Forum 2016

  24. Acquire and release operations ● Acquire: ● Release: ● pthread_mutex_lock ● pthread_mutex_unlock ● pthread_join ● pthread_create ● pthread_once ● pthread_once (first time) ● pthread_cond_wait ● pthread_cond_signal ● pthread_cond_broadcast ● pthread_cond_wait Paolo Bonzini – KVM Forum 2016

  25. Why atomics work ● Atomics let threads access mutable shared data without causing data races ● Atomics define happens-before across threads ● Programs that correctly use locks to prevent all data races behave as sequentially consistent ● Same for programs that do not use so-called “relaxed” atomics Paolo Bonzini – KVM Forum 2016

  26. Outline ● Who ordered atomics? ● Compilers and the need for a memory model ● qemu/atomic.h : portable atomics in QEMU ● Future work Paolo Bonzini – KVM Forum 2016

  27. Problems with C11 atomics ● Only supported by very recent compilers ▶ Limit to what older compilers can “emulate” ● Very large API, few people can understand it ▶ Start small, later add what turns out to be useful ● Some rules conflict with older usage foo->bar = 1; foo->bar = 1; foo->bar = 1; smp_wmb(); atomic_thread_fence(memory_order_release); atomic_store(&x, foo, memory_order_release); x = foo; atomic_store(&x, foo, memory_order_relaxed); Paolo Bonzini – KVM Forum 2016

  28. Choosing the API ● Yes: ● No: ● Everything seq_cst ● RMW operations (load, store, RMW) other than seq_cst ● Maybe: ● Relaxed load/store ● RCU load/store ● C11-style memory ● Legacy: barriers ● Load-acquire ● Compiler barrier ● Store-release ● Linux-style memory barriers Paolo Bonzini – KVM Forum 2016

  29. qemu/atomic.h API ● atomic_mb_read ● atomic_fetch_add atomic_mb_set atomic_fetch_sub atomic_fetch_inc ● atomic_rcu_read ... atomic_rcu_set ● atomic_add ● atomic_read atomic_sub atomic_set atomic_inc ● smp_mb ... smp_rmb (load-load) ● atomic_xchg smp_wmb (store-store) ● atomic_cmpxchg Paolo Bonzini – KVM Forum 2016

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend