fast less complicated lock free data structures ulrich
play

Fast, less-complicated, lock-free Data Structures Ulrich Drepper - PowerPoint PPT Presentation

Fast, less-complicated, lock-free Data Structures Ulrich Drepper ulrich.drepper@gs.com Accelerate Code Not (much) through new hardware Split into independent pieces Splitting comes at a cost Marshaling between stages


  1. Fast, less-complicated, lock-free Data Structures Ulrich Drepper ulrich.drepper@gs.com

  2. Accelerate Code ● Not (much) through new hardware ● Split into independent pieces ● Splitting comes at a cost ● Marshaling between stages ● Increased latency for pipeline ● Realistically: Parallelization needed! 2

  3. Parallelization ● Alternatives Extended “Amdahl's Law” 1 ● Multi-process S = ( 1 − P ) + P N ( 1 + O P ) or 2.5 ● Multi-thread 2 ● Error prone 1.5 ● High level of 1 parallelization needed 0.5 ● Keep cost of 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 parallelization ( O p ) low P = 0.6 3

  4. Parallelization ● Collaboration through shared memory ● Synchronized access ● Synchronized access to data structures ● Atomic data structures (mostly based on Compare-And-Swap) bool __sync_bool_compare_and_swap(TYPE *ptr, TYPE oldval, TYPE newval) { if (*ptr != oldval) return false; *ptr = newval; return true; } 4

  5. Lock-Free Data Structures Single Double LIFO FIFO Hash Linked Linked 1:1 CAS CAS 1:N CAS No Priority N:1 CAS CAS M:N CAS 1:1 CAS CAS 1:N Priority N:1 CAS CAS M:N 5

  6. x86 Special Single Double LIFO FIFO Hash Linked Linked 1:1 CAS CAS 1:N CAS DWCAS No Priority N:1 CAS CAS M:N CAS DWCAS 1:1 CAS CAS Double-wide CAS 1:N Priority N:1 CAS CAS M:N 6

  7. Extended CAS ● Wider, more complicated CAS not the answer DCAS is not a Silver Bullet for Nonblocking Algorithm Design Doherty, Detlefs, Groves, Flood, Luchangco, Martin, Moir, Shavit, Steele, SPAA '04, 2004 7

  8. Locking ● Bane of Programming ● Interface design: explicit or implicit locking? ● Often unnecessary overhead ● Composability problem ● AB-BA locking problem void move(dbllist<T> &target, dbllist<T>::it &prev, dbllist<T> &source, dbllist<T>::it &elem); How to implement internal locking? 8

  9. Locking and Latency ● Yes, there are spinlocks Detect Lock Wakeup Collision ● Fairer/more power efficient Signal locking requires sleep Enter Delay ● Sleep requires wakeup Kernel Latency Exit Wake Kernel Resume Lock Operation 9

  10. Way Forward Two complimentary approaches ● Improve implementation of locking to ● Reduce contention ● Reduce cost of the operation ● Replace concept of locking 10

  11. Way Forward Two complimentary approaches ● Improve implementation of locking to ● Reduce contention ● Reduce cost of the operation Hardware Lock Elision (HLE) ● Replace concept of locking Transactional Memory (TM) 11

  12. Increase Parallelism ● Reduce lock contention ● Avoid “optimizations” like 4 reader-writer locks 3.5 3 ● Enable more code to be 2.5 parallelized 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 P = 0.6 P = 0.8 12

  13. Running Example 13

  14. Locking Hash Tables ● Designed for concurrent accesses Thread 1 ● In practice mostly read accesses Separate Memory ● Even write accesses likely Locations will not conflict Thread 2 ● Locking is overkill 14

  15. Hash Table With locking 15

  16. Mutually Exclusive Access CAS(mutex, 0, 1) Mutex Yes == 0 Set 1? No Yes Read Delay Table Entry Update Table Wake Entry Store 0 in Mutex 16

  17. Mutually Exclusive Access Mutex Yes == 0 Set 1? No Yes Read Delay Table Hash Entry Mutex Tab Memory Memory Update Table Wake Entry Store 0 in Mutex 17

  18. Mutually Exclusive Access Mutex Yes == 0 Set 1? No No Yes Read Delay Table Entry Net Effect On Mutex: Nothing Update Table Wake Entry Store 0 in Mutex 18

  19. Hardware Lock Elision 19

  20. With Lock Elision What if '1' is Mutex Yes not written? == 0 Set 1? No Yes Read Delay Table Entry Update Table Wake Entry Store 0 in Mutex 20

  21. With Lock Elision Mutex Yes == 0 Set 1? Thread 1 No Read Yes Thread 2 Delay Table Entry Update Table Wake Entry Store 0 in Mutex 21

  22. With Lock Elision Mutex Yes == 0 Set 1? Thread 1 No Read Yes Thread 2 Delay Table No Mutual Entry Exclusion! Update Table Wake Entry Store 0 in Mutex 22

  23. No Mutual Exclusion ● Bad Mutex Yes ● But only if == 0 Set 1? Thread 1 ● Concurrent access to No Read Yes Thread 2 same memory location Delay Table Entry ● At least one of the accesses is write Update Table Wake Entry Store 0 in Mutex 23

  24. Alternative Mutex Yes == 0 Set 1? Thread 1 No Read Yes Thread 2 Detect Collisions! Delay Table Entry Update Table Wake Entry Store 0 in Mutex 24

  25. Intel HLE 25

  26. x86 code for Hash Table Thread 1 L1 Data Cache lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx mov $0, mut Hash Table call wake Thread 2 lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 mov $0, mut Main Memory call wake 26

  27. New in Intel HLE Thread 1 Transaction xacquire lock Flag cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx xrelease mov $0, mut Hash Table call wake Thread 2 xacquire lock Lock Cache cmpxchg %ebx, mut jne 2f 0 Mutex New Instruction mov $4, table+5 Prefixes xrelease mov $0, mut (compatible) call wake 27

  28. Successful Concurrent Use 28

  29. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx xrelease mov $0, mut Hash Table call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 xrelease mov $0, mut call wake 29

  30. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 xrelease mov $0, mut call wake 30

  31. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 xrelease mov $0, mut call wake 31

  32. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 T 1 xrelease mov $0, mut Old: 0 call wake 32

  33. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 T 1 xrelease mov $0, mut Old: 0 call wake 33

  34. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 ✓ xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 T 1 xrelease mov $0, mut Old: 0 call wake 34

  35. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut 42 jne 2f 42 mov table+2, %edx 1 0 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 T 1 ✓ xrelease mov $0, mut Old: 0 call wake 35

  36. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut 42 jne 2f 42 mov table+2, %edx 0 xrelease mov $0, mut Hash Table call wake 4 Thread 2 xacquire lock 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 1 0 xrelease mov $0, mut Old: 0 call wake 36

  37. Unsuccessful Concurrent Use 37

  38. With Collision Thread 1 xacquire lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx xrelease mov $0, mut Hash Table call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 xrelease mov $0, mut call wake 38

  39. With Collision Thread 1 xacquire lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 xrelease mov $0, mut call wake 39

  40. With Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 xrelease mov $0, mut call wake 40

  41. With Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 T 1 xrelease mov $0, mut Old: 0 call wake 41

  42. With Collision Thread 1  xacquire lock cmpxchg %ebx, mut T 42  jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 T 1 xrelease mov $0, mut Old: 0 call wake 42

  43. With Collision Thread 1  xacquire lock cmpxchg %ebx, mut T 42  jne 2f 42 mov table+2, %edx  T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 T 1 xrelease mov $0, mut Old: 0 call wake 43

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend