be my guest
play

Be My Guest MCS Lock Now Welcomes Guests Tianzheng Wang , - PowerPoint PPT Presentation

Be My Guest MCS Lock Now Welcomes Guests Tianzheng Wang , University of Toronto Milind Chabbi, Hewlett Packard Labs Hideaki Kimura, Hewlett Packard Labs Protecting shared data using locks foo() { Centralized spin locks lock.acquire();


  1. Be My Guest – MCS Lock Now Welcomes Guests Tianzheng Wang , University of Toronto Milind Chabbi, Hewlett Packard Labs Hideaki Kimura, Hewlett Packard Labs

  2. Protecting shared data using locks foo() { Centralized spin locks lock.acquire(); – Test-and-set, ticket, etc. data = my_value; – Easy implementation lock.release(); – Widely adopted } – Waste Interconnect traffic – Cache ping-ponging lock Contention on a centralized location

  3. MCS Locks Non-standard interface SWAP lock foo( qnode ) { lock.acquire( qnode ); granted waiting data = my_value; next next R1 R2 lock.release( qnode ); } – Local spinning Queue nodes – FIFO order everywhere 3

  4. “…it was especially complicated when the critical section spans multiple functions. That required having functions also accepting an additional MCS node in its parameter .” - Jason Low, HPE’s Linux kernel developer Not easy to adopt MCS lock with non-standard API 4

  5. “… out of the 300+ places that make use of the dcache lock, 99% of the contention came from only 2 functions. Changing those 2 functions to use the MCS lock was fairly trivial...” - Jason Low, HPE’s Linux kernel developer Not all lock users are created equal 5

  6. Regular users Guests infrequent_func …( qnod e ) infrequent_func2(qnode frequent_func( qnode ) { ) infrequent_func1(qnode) { lock.acquire( qnode ); lock.acquire( qnode ); ... ... lock.release( qnode ); lock.release( qnode ); } } – Transaction workers vs. DB snapshot composer – Worker threads vs. daemon threads 6

  7. Existing approaches Multi-process Storage requirements applications Thread-local Bloated memory queue nodes Works usage Queue nodes on K42-MCS Satisfies the stack Extra memory per node Cohort locks Works Possible data layout change 7

  8. MCSg: best(MCS) + best(TAS) Guests Regular users bar() { foo( qnode ) { lock.acquire(); lock.acquire( qnode ); ... ... lock.release(); lock.release( qnode ); } } Keeps all No queue node needed the benefits of MCS 8

  9. MCSg: use cases – Drop-in replacement for MCS to support guests – Replace a centralized spinlock for performance – Start from all guests, – Gradually identify regular users and adapt – As a building block for composite locks – Same interface as MCS – Same storage requirement 9

  10. Guests in MCSg  : “guest has the lock” lock Standard interface CAS(NULL,  ) CAS(  ,NULL) acquire () Retry until Retry until release() success success Guests: similar to using a centralized spin lock 10

  11. Regular users – change in acquire() No guest:  same as MCS r = SWAP(N1) waiting | NULL acquire(N1) 11

  12. Regular users – change in acquire() N1 r = SWAP(N1) waiting | NULL acquire(N1) 12

  13. Regular users – change in acquire()  r = SWAP(N1) waiting | NULL acquire(N1) t = SWAP(  ) r ==  , return  for the guest to release the lock t == N1/another ptr r == NULL Retry with r = SWAP( t ) Got lock +5 LoC in acquire(…), no change in release(…) 13

  14. MCSg++ extensions – Guest starvation – CAS: no guaranteed success in a bounded # of steps – Solution: attach the guest after a regular user – FIFO order violations – Retrying XCHG might line up after a later regular user – Solution: retry with ticket 14

  15. Reducing guest starvation R2 granted waiting next next R1 R2 G r = XCHG(  ) r.next = Guest Waiting spin until r.next == Guest Granted r.next = Guest Acquired 15

  16. Reducing guest starvation R2 r = SWAP(  ) granted waiting next next R1 R2 G r = XCHG(  ) r.next = Guest Waiting spin until r.next == Guest Granted r.next = Guest Acquired 16

  17. Reducing guest starvation R2 r = SWAP(  ) granted waiting GW next spin R1 R2 G r = XCHG(  ) r.next = Guest Waiting spin until r.next == Guest Granted r.next = Guest Acquired 17

  18. Reducing guest starvation R2 r = SWAP(  ) granted waiting GG next spin R1 R2 G r = XCHG(  ) r.next = Guest Waiting spin until r.next == Guest Granted r.next = Guest Acquired 18

  19. Reducing guest starvation R2 r = SWAP(  ) granted waiting GA next ack R1 R2 G r = XCHG(  ) r.next = Guest Waiting spin until r.next == Guest Granted r.next = Guest Acquired 19

  20. Evaluation – HP DragonHawk – 15-core Xeon E7-4890 v2 @ 2.80GHz – 16 sockets  240 physical cores – L2 256KB/core, L3 38MB/socket, 12TB DRAM – Microbenchmarks – MCSg, MCSg++, CLH, K42-MCS, TATAS – Critical section: 2 cache line accesses, high contention – TPC-C with MCSg in FOEDUS, an OSS database 20

  21. Maintaining MCS’s scalability – TPC-C Payment – 192 workers – Highly contented – one warehouse Lock MTPS STDEV TATAS 0.33 0.095 MCS 0.46 0.011 MCSg 0.45 0.004 21

  22. One guest + 223 regular users 224 regular users 22

  23. One guest + 223 regular users Starved 23

  24. Varying number of guests Total throughput No ticketing 24

  25. Varying number of guests Guest throughput No ticketing 25

  26. Conclusions – Not all lock users are created equal – Pervasive guests prevent easy adoption of MCS lock – MCSg: dual-interface – Regular users: acquire/release(lock, qnode) – Infrequent guests: acquire/release(lock) – Easy-to-implement: ~20 additional LoC – As scalable as MCS (guests being minority at runtime) Find out more in our paper! 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend