transactional garbage
play

Transactional Garbage and how to collect it for fun and profit - PowerPoint PPT Presentation

Transactional Garbage and how to collect it for fun and profit Fadi Meawad Ryan Macnak Jan Vitek S3Lab Computer Science Dept Purdue University 1 % of time spent in GC C# STMBench7 on Bartok # of threads 8-core, 1.60GHz Intel Xeon


  1. Transactional Garbage and how to collect it for fun and profit Fadi Meawad Ryan Macnak Jan Vitek S3Lab Computer Science Dept Purdue University 1

  2. % of time spent in GC C# STMBench7 on Bartok # of threads 8-core, 1.60GHz Intel Xeon E5310. 8GB RAM. Physical Address Extension enabled, running Windows Server 2003 SP2. 2

  3. Up to 98% % of time spent in GC C# STMBench7 on Bartok # of threads 8-core, 1.60GHz Intel Xeon E5310. 8GB RAM. Physical Address Extension enabled, running Windows Server 2003 SP2. 2

  4. What does this have to do with Transactional Memory? 3

  5. Let’s Benchmark GCBench ‣ New micro-benchmark, creates a linked list of lists ‣ A transaction traverses lists, at every node, either: • Update node • Allocate unreachable or live object • Allocate object unreachable after commit • Make object unreachable Wormbench ‣ C#, designed for the Bartok STM ‣ A “worm” with a triangular head and a tail lives in a matrix with other worms ‣ 15 ops, such as move forward or turn right 4 4

  6. Let’s Benchmark STMBench7 ‣ Trees, graphs and indices as in CAD/CAM workloads ‣ 500MB of data ‣ Configure either r-, r/w- or w-dominated workload ‣ Long or short traversals LeeTM ‣ Automatic circuit routing using Lee’s algorithm ‣ Pairs of points on a grid connected with non- intersecting paths. 5 5

  7. slow down C# GCBench list size 8-core, 1.60GHz Intel Xeon E5310. 8GB RAM. Physical Address Extension enabled. Windows Server 2003 SP2. 6

  8. Where did that time go? list size !"" #"" $"" %"" &"" '"" ("" )"" "* +* #$* $'* %)* &%* '!* '(* ! !#* $&* &'* '+* (%* )#* )&* ))* # # of threads #&* &$* (#* (+* )(* +!* +$* +&* $ $+* ''* )!* ))* +#* +&* +'* +(* % %"* (&* )(* +#* +&* +'* +(* +)* & %+* )!* +"* +&* +(* +)* +)* ++* ' '"* )#* +$* +'* +(* +)* ++* ++* ( '%* )&* +%* +(* +)* ++* ++* ++* ) C# GCBench, % of time spent in GC 7

  9. Maybe it is just in Bartok! 8

  10. What is Bartok STM? Optimizing ahead-of-time research compiler & runtime STM ‣ Object-based, in-place, optimistic updates. ‣ Read-enlistment, update-enlistment & undo-value logs • Allocated from normal heap ‣ Transaction ID is stored in object's header GC ‣ 2-generational semispace copying collector ‣ Stop the world [Harris, Plesko, Shinnar, Tarditi, Optimizing memory transactions, PLDI06 ] 9 9

  11. GC % using Multiverse =>3 ?@+A*B+,C.BD#.%@"'#E !"#$"%&' ()*+,-./#. !"#$"%&' ()*+,-./#. F*@G'@G" 123 043 0647 0 05 44 9:3 ;83 86:7 8 02 01< <:3 ;23 ;617 ; 85 021 <03 183 8647 1 ;0 852 :93 123 ;6<7 9 ;8 819 :53 9:3 1617 < 10 ;02 213 <<3 <627 : 11 195 213 :<3 086<7 2 1: :<4 Java GCBench size 800, % of time spent in GC 10

  12. GC % using Multiverse =>3 ?@+A*B+,C.BD#.%@"'#E !"#$"%&' ()*+,-./#. !"#$"%&' ()*+,-./#. F*@G'@G" 123 043 0647 0 05 44 9:3 ;83 86:7 8 02 01< <:3 ;23 ;617 ; 85 021 <03 183 8647 1 ;0 852 :93 123 ;6<7 9 ;8 819 :53 9:3 1617 < 10 ;02 213 <<3 <627 : 11 195 213 :<3 086<7 2 1: :<4 Java GCBench size 800, % of time spent in GC 10

  13. Does it depend on the GC? GCBench, size 600 execution time (seconds) # of threads 11

  14. Does the problem Scale? GCBench size 800, Azul execution time (seconds) # of threads Azul Vega 3 3310B, two 54-core processors. 48GB of RAM, Azul VM. Concurrent Pauseless GC 12

  15. Does the problem Scale? memory usage (GB) GCBench size 800, Azul # of threads 13

  16. What can we do about it? 14

  17. Logs in Bartok Object-based in place updates with 1$(#2")3$4.% !"#$0 *"5%$6.7,$-/ undo logs .()) ) )) Reads List: Node1: 1 00 v100 0 00 The read-object log contains List VTable Node VTable read STM Word (version #) Head Value = 10 Updates Tail Next = null An object opened for update has Sum = 42 an updated-object log entry Updated-object (previous STM word) log entry: v90 0 00 Upon update, old value is Transaction manager maintained in an undo log Offset in log chunk Chunks Logs are allocated in chunks +,-. 86#"%*"5% maintained by the STM $6.7,$-/ 4>?@?ABCDE@F Discarded at end of transaction 899:%,&'&; <=%.&'&67 [Harris, Plesko, Shinnar, Tarditi, Optimizing memory transactions, PLDI 06 ] 15

  18. What is log reuse, and is it enough? Log Reuse ‣ At transaction end, preserve log chunks into a pool rather than leave for GC ‣ When a transaction needs a new log chunk, try the pool first, otherwise allocate Issues ‣ Hard to decide when to deallocate log chunks ‣ Large initializing Txs followed by small ones will result a large unused pool ‣ The pool will be traced by the GC ‣ Weak references are expensive 16 16

  19. Dedicated Nurseries Generational-GC nurseries ‣ Most objects either die young or live forever Transaction nurseries ‣ Objects allocated in transaction not visible to other threads until commit ‣ Reclaim nursery in one step after abort ‣ Can support nested transactions ‣ Finalization? 17 17

  20. Dedicated Nurseries 18 18

  21. Dedicated Heap ‣ Much of transactional allocation is logs ‣ Lifetime of logs bounded by the transaction they serve ‣ Known lifetime allows manual memory management ‣ Cheap to allocate/free in chunks from a mutexed freelist 19 19

  22. Dedicated Heap 20

  23. Does it work? % total speedup ' ( $ " '# " & &# ( ) -./012 $ & ' "# %+,-./012 $# %# # list size %## $## "## &## '## (## )## *## !%# C# GCBench !$# 8-core, 1.60GHz Intel Xeon E5310. 8GB RAM. Physical Address Extension enabled, Windows Server 2003 SP2. 21

  24. Does it work? ops per second C# STMBench7 # of threads 22

  25. Does it work? ("# "'# % increase in ops per second "&# ,-./0123452637605.38759 "%# C# STMBench7 "$# ,-./0123452637609:638759 ""# '# &# %# $# "# !"# " ( $ ) % * & + # of threads 23

  26. LeeTM, issues Issues ‣ Allocates a temp grid within the transaction ‣ Bartok STM does not optimize multidimensional array access (excessive logging) Workarounds ‣ Allocate before the transaction (Opt) ‣ Use RowMajor array access (RM) 24 24

  27. Results (cont’d) '#!" +,-./010234" C# LeeTM +,-,560786" '!!" total time (seconds) &!" +,-./010234" +,-,560786" .9:-./010234" %!" .9:-,560786" .9:+,-./010234" .9:+,-,560786" +,-,560786" $!" .9:-./010234" #!" .9:-,560786" .9:+,-./010234" .9:+,-,560786" !" '" #" (" $" )" %" *" &" # of threads 25

  28. Results (cont’d) )# .,-/0 /0 (# '# % of time saved &# +,- C# LeeTM .,-/0 %# /0 $# .,-/0 /0 "# +,- # " $ % & ' ( ) * # of threads Running on an 8-core, 1.60GHz Intel Xeon E5310 with 8GB of RAM and Physical Address Extension enabled, running Windows Server 2003 SP2. 26

  29. Conclusion Memory Usage ‣ Same overall allocated memory ‣ Less demands on GCed heap Speed Up Applying to other systems ‣ Not for library based STM systems, but with runtime support will work with most STM flavors 27 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend