Transactional Garbage and how to collect it for fun and profit - PowerPoint PPT Presentation

Transactional Garbage and how to collect it for fun and profit Fadi Meawad Ryan Macnak Jan Vitek S3Lab Computer Science Dept Purdue University 1

% of time spent in GC C# STMBench7 on Bartok # of threads 8-core, 1.60GHz Intel Xeon E5310. 8GB RAM. Physical Address Extension enabled, running Windows Server 2003 SP2. 2

Up to 98% % of time spent in GC C# STMBench7 on Bartok # of threads 8-core, 1.60GHz Intel Xeon E5310. 8GB RAM. Physical Address Extension enabled, running Windows Server 2003 SP2. 2

What does this have to do with Transactional Memory? 3

Let’s Benchmark GCBench ‣ New micro-benchmark, creates a linked list of lists ‣ A transaction traverses lists, at every node, either: • Update node • Allocate unreachable or live object • Allocate object unreachable after commit • Make object unreachable Wormbench ‣ C#, designed for the Bartok STM ‣ A “worm” with a triangular head and a tail lives in a matrix with other worms ‣ 15 ops, such as move forward or turn right 4 4

Let’s Benchmark STMBench7 ‣ Trees, graphs and indices as in CAD/CAM workloads ‣ 500MB of data ‣ Configure either r-, r/w- or w-dominated workload ‣ Long or short traversals LeeTM ‣ Automatic circuit routing using Lee’s algorithm ‣ Pairs of points on a grid connected with non- intersecting paths. 5 5

slow down C# GCBench list size 8-core, 1.60GHz Intel Xeon E5310. 8GB RAM. Physical Address Extension enabled. Windows Server 2003 SP2. 6

Where did that time go? list size !"" #"" $"" %"" &"" '"" ("" )"" "* +* #$* $'* %)* &%* '!* '(* ! !#* $&* &'* '+* (%* )#* )&* ))* # # of threads #&* &$* (#* (+* )(* +!* +$* +&* $ $+* ''* )!* ))* +#* +&* +'* +(* % %"* (&* )(* +#* +&* +'* +(* +)* & %+* )!* +"* +&* +(* +)* +)* ++* ' '"* )#* +$* +'* +(* +)* ++* ++* ( '%* )&* +%* +(* +)* ++* ++* ++* ) C# GCBench, % of time spent in GC 7

Maybe it is just in Bartok! 8

What is Bartok STM? Optimizing ahead-of-time research compiler & runtime STM ‣ Object-based, in-place, optimistic updates. ‣ Read-enlistment, update-enlistment & undo-value logs • Allocated from normal heap ‣ Transaction ID is stored in object's header GC ‣ 2-generational semispace copying collector ‣ Stop the world [Harris, Plesko, Shinnar, Tarditi, Optimizing memory transactions, PLDI06 ] 9 9

GC % using Multiverse =>3 ?@+A*B+,C.BD#.%@"'#E !"#$"%&' ()*+,-./#. !"#$"%&' ()*+,-./#. F*@G'@G" 123 043 0647 0 05 44 9:3 ;83 86:7 8 02 01< <:3 ;23 ;617 ; 85 021 <03 183 8647 1 ;0 852 :93 123 ;6<7 9 ;8 819 :53 9:3 1617 < 10 ;02 213 <<3 <627 : 11 195 213 :<3 086<7 2 1: :<4 Java GCBench size 800, % of time spent in GC 10

Does it depend on the GC? GCBench, size 600 execution time (seconds) # of threads 11

Does the problem Scale? GCBench size 800, Azul execution time (seconds) # of threads Azul Vega 3 3310B, two 54-core processors. 48GB of RAM, Azul VM. Concurrent Pauseless GC 12

Does the problem Scale? memory usage (GB) GCBench size 800, Azul # of threads 13

What can we do about it? 14

Logs in Bartok Object-based in place updates with 1$(#2")3$4.% !"#$0 *"5%$6.7,$-/ undo logs .()) ) )) Reads List: Node1: 1 00 v100 0 00 The read-object log contains List VTable Node VTable read STM Word (version #) Head Value = 10 Updates Tail Next = null An object opened for update has Sum = 42 an updated-object log entry Updated-object (previous STM word) log entry: v90 0 00 Upon update, old value is Transaction manager maintained in an undo log Offset in log chunk Chunks Logs are allocated in chunks +,-. 86#"%*"5% maintained by the STM $6.7,$-/ 4>?@?ABCDE@F Discarded at end of transaction 899:%,&'&; <=%.&'&67 [Harris, Plesko, Shinnar, Tarditi, Optimizing memory transactions, PLDI 06 ] 15

What is log reuse, and is it enough? Log Reuse ‣ At transaction end, preserve log chunks into a pool rather than leave for GC ‣ When a transaction needs a new log chunk, try the pool first, otherwise allocate Issues ‣ Hard to decide when to deallocate log chunks ‣ Large initializing Txs followed by small ones will result a large unused pool ‣ The pool will be traced by the GC ‣ Weak references are expensive 16 16

Dedicated Nurseries Generational-GC nurseries ‣ Most objects either die young or live forever Transaction nurseries ‣ Objects allocated in transaction not visible to other threads until commit ‣ Reclaim nursery in one step after abort ‣ Can support nested transactions ‣ Finalization? 17 17

Dedicated Nurseries 18 18

Dedicated Heap ‣ Much of transactional allocation is logs ‣ Lifetime of logs bounded by the transaction they serve ‣ Known lifetime allows manual memory management ‣ Cheap to allocate/free in chunks from a mutexed freelist 19 19

Dedicated Heap 20

Does it work? % total speedup ' ( $ " '# " & &# ( ) -./012 $ & ' "# %+,-./012 $# %# # list size %## $## "## &## '## (## )## *## !%# C# GCBench !$# 8-core, 1.60GHz Intel Xeon E5310. 8GB RAM. Physical Address Extension enabled, Windows Server 2003 SP2. 21

Does it work? ops per second C# STMBench7 # of threads 22

Does it work? ("# "'# % increase in ops per second "&# ,-./0123452637605.38759 "%# C# STMBench7 "$# ,-./0123452637609:638759 ""# '# &# %# $# "# !"# " ( $ ) % * & + # of threads 23

LeeTM, issues Issues ‣ Allocates a temp grid within the transaction ‣ Bartok STM does not optimize multidimensional array access (excessive logging) Workarounds ‣ Allocate before the transaction (Opt) ‣ Use RowMajor array access (RM) 24 24

Results (cont’d) '#!" +,-./010234" C# LeeTM +,-,560786" '!!" total time (seconds) &!" +,-./010234" +,-,560786" .9:-./010234" %!" .9:-,560786" .9:+,-./010234" .9:+,-,560786" +,-,560786" $!" .9:-./010234" #!" .9:-,560786" .9:+,-./010234" .9:+,-,560786" !" '" #" (" $" )" %" *" &" # of threads 25

Results (cont’d) )# .,-/0 /0 (# '# % of time saved &# +,- C# LeeTM .,-/0 %# /0 $# .,-/0 /0 "# +,- # " $ % & ' ( ) * # of threads Running on an 8-core, 1.60GHz Intel Xeon E5310 with 8GB of RAM and Physical Address Extension enabled, running Windows Server 2003 SP2. 26

Conclusion Memory Usage ‣ Same overall allocated memory ‣ Less demands on GCed heap Speed Up Applying to other systems ‣ Not for library based STM systems, but with runtime support will work with most STM flavors 27 27

Transactional Garbage and how to collect it for fun and profit - PowerPoint PPT Presentation

Transactional Garbage and how to collect it for fun and profit Fadi Meawad Ryan Macnak Jan Vitek S3Lab Computer Science Dept Purdue University 1 % of time spent in GC C# STMBench7 on Bartok # of threads 8-core, 1.60GHz Intel Xeon

Uniprocessor Garbage Collection Techniques Presented by: Shiri Dori Shai Erera Outline

Garbage Collection Akim Demaille, Etienne Renault, Roland Levillain June 4, 2019 TYLA Garbage

Garbage Collection Jan Midtgaard Michael I. Schwartzbach Aarhus University The Garbage

GARBAGE BAGE CO COLLECTIO LLECTION: N: @EvaAndreasson, @Cloudera AGENDA Garbage

Incremental Garbage Collection Part II Roland Schatz Incremental Garbage Collection p.1/22

Garbage Collection Last time Compiling Object-Oriented Languages Today

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 25 November 2016 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 27 November 2015 Lecture 8

Is Big Data ready to improve outcomes or is it a new generation of garbage in/garbage out? Yves

Piggeries were developed where swine are fed fresh or cooked garbage. One expert estimates that 75

Sometimes There are Dumb Questions Garbage in-Garbage out: Why most surveys are worse than

REFUSE, REDUCE, REUSE, RECYCLE, ROT SALINAS VALLEY RECYCLES SERVICE AREA WHERE DOES OUR GARBAGE

Circuit Breakers to safeguard for Garbage in, Garbage out Sandeep Uttamchandani Chief Data

Prioritized Garbage Collection Using the Garbage Collector to Support Caching Diogenes Nunez ,

A Semi Preemptive Garbage A Semi Preemptive Garbage Collector for Solid State Collector

Implicit Memory Alloca6on: Garbage Collec6on Garbage collec+on:

COMP 250 Lecture 34 Polymorphism (continued.) Garbage Collection (mark and sweep) Nov. 27,

Lecture 15: Charm++ Abhinav Bhatele, Department of Computer Science Task-based programming models

Building graphical user interfaces with GLADE and Gtk+/GtkAda Jacob Sparre Andersen

At Risk Youth What we dont know could cost them their lives Lauri Burns , Founder of The Teen

r t r t

Topic 9: Holographic Interferometry Aim: Covers the basics of frozen fringe, live fringe and time

C ategorical A utomata L earning F ramework calf-project.org Gerco van Heerdt Matteo Sammartino

Manual Allocation Dynamic memory allocation is an obvious necessity in a programming environment.