On the Performance of Window-Based Contention Managers for - PowerPoint PPT Presentation

On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University

Agenda Introduction and Motivation • Previous Studies and Limitations • Execution Window Model • ➢ Theoretical Results ➢ Experimental Results Conclusions and Future Directions •

Retrospective 1993 • A seminal paper by Maurice Herlihy and J. Eliot B. Moss: ➢ “ Transactional Memory: Architectural Support for Lock-Free Data Structures ” Today • Several STM/HTM implementation efforts by Intel, Sun, IBM; ➢ growing attention Why TM? • Many drawbacks of traditional approaches using Locks, Monitors: ➢ error-prone, difficult, composability , … Lock: only one thread can execute TM: many threads can execute lock data atomic { modify/use data modify/use data unlock data }

Transactional Memory Transactions perform a sequence of read and write operations on • shared resources and appear to execute atomically TM may allow transactions to run concurrently but the results must • be equivalent to some sequential execution Example: Initially, x == 1, y == 2 atomic { T2 T1 T1 T2 atomic { r1 == 1 x = 2; r1 = x; x = 2; y = x+1; r2 = y; y = 3; } } r2 = 3; T1 then T2 r1==2, r2==3 Incorrect r1 == 1, r2 == 3 T2 then T1 r1==1, r2==2 ACI(D) properties to ensure correctness •

Software TM Systems Conflicts: A contention manager decides ➢ Aborts or delay a transaction ➢ Centralized or Distributed: Each thread may have its own CM ➢ Example: Initially, x == 1, y == 1 T2 T2 T1 atomic { atomic { atomic { atomic { T1 … … y = 2; y = 2; conflict conflict x = 2; x = 2; … … } } x = 3; x = 3; } } Abort (set y==1) and restart Abort undo changes (set x==1) OR wait and retry and restart

Transaction Scheduling The most common model: m concurrent transactions on m cores that share s objects ➢ Sequence of operations and a operation takes one time unit ➢ Duration is fixed ➢ Throughput Guarantees: Makespan: the time needed to commit all m transactions ➢ Makespan of my CM ➢ Competitive Ratio : Makespan of optimal CM 1 3 4 Problem Complexity: 2 5 NP-Hard (related to vertex coloring) ➢ 7 6 8 Challenge: How to schedule transactions so that makespan is minimized? ➢

Literature Lots of proposals • Polka, Priority, Karma, SizeMatters , … ➢ • Drawbacks ➢ Some need globally shared data (i.e., global clock) ➢ Workload dependent ➢ Many have no theoretical provable properties ✓ i.e., Polka – but overall good empirical performance • Mostly empirical evaluation using different benchmarks ➢ Choice of a contention manager significantly affects the performance ➢ Do not perform well in the worst-case (i.e., contention, system size, and number of threads increase)

Literature on Theoretical Bounds Guerraoui et al. [PODC’05]: First contention manager GREEDY with O( s 2 ) competitive bound Attiya et al. [PODC’06]: Bound of GREEDY improved to O( s ) Schneider and Wattenhofer [ISAAC’09]: RandomizedRounds with O( C . log m ) ( C is the maximum degree of a transaction in the conflict graph) Attiya et al. [OPODIS’09]: Bimodal scheduler with O( s ) bound for read-dominated workloads Sharma and Busch [OPODIS’10]: Two algorithms with O( √𝑡 ) and O( 𝑡. log 𝑜 ) bounds for balanced workloads

Objectives Scalable transactional memory scheduling: ➢ Design contention managers that exhibit both good theoretical and empirical performance guarantees ➢ Design contention managers that scale well with the system size and complexity

Execution Window Model • Collection of n sets of m concurrent transactions that share s objects Transactions . . . n 2 3 1 1 2 3 m Threads . . . m n Assuming maximum degree Serialization upper bound: τ . min(C n , mn ) in conflict graph C and One-shot bound: O( sn ) [Attiya et al., PODC’06] execution time duration τ Using RandomizedRounds : O( τ . C n log m )

Theoretical Results Offline Algorithm: (maximal independent sets) • For scheduling with conflicts environments, i.e., traffic ➢ intersection control, dining philosophers problem Makespan: O( τ . (C + n log ( mn )), (C is the conflict measure) ➢ Competitive ratio: O( s + log ( mn )) whp ➢ Online Algorithm: (random priorities) • For online scheduling environments ➢ Makespan: O( τ . (C log ( mn ) + n log 2 ( mn ))) ➢ Competitive ratio: O( s log ( mn ) + log 2 ( mn ))) whp ➢ Adaptive Algorithm • Conflict graph and maximum degree C both not known ➢ Adaptively guesses C starting from 1 ➢

Intuition (1) Introduce random delays at the beginning of the • execution window n’ Transactions 1 2 3 n . . . n 1 2 3 1 2 3 m m m n Random interval n Random delays help conflicting transactions shift • avoiding many conflicts

Intuition (2) Frame based execution to handle conflicts • Frame size q 1 F 1n F 11 F 12 q 2 F 21 q 3 F 31 F 3n Threads q 4 F 41 m F m1 Makespan: max {qi} + No of frames X frame size

Experimental Results (1) • Platform used ➢ Intel i7 (4-core processor) with 8GB RAM and hyperthreading on • Implemented window algorithms in DSTM2, an eager conflict management STM implementation • Benchmarks used List, RBTree, SkipList, and Vacation from STAMP suite. ➢ • Experiments were run for 10 seconds and the data plotted are average of 6 experiments • Contention managers used for comparison ➢ Polka – Published best CM but no theoretical provable properties ➢ Greedy – First CM with both theoretical and empirical properties ➢ Priority – Simple priority-based CM

Experimental Results (2) Performance throughput: ➢ No of txns committed per second ➢ Measures the useful work done by a CM each time step List Benchmark SkipList Benchmark 18000 20000 16000 18000 16000 14000 Committed transactions/sec Committed transactions/sec 14000 12000 12000 10000 10000 8000 8000 6000 6000 4000 4000 2000 2000 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 No of threads No of threads Polka Greedy Priority Online Adaptive Polka Greedy Priority Online Adaptive

Experimental Results (3) Performance throughput: Vacation Benchmark RBTree Benchmark 18000 14000 16000 12000 14000 Committed transactions/sec Committed transacions/sec 10000 12000 10000 8000 8000 6000 6000 4000 4000 2000 2000 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 No of threads No of threads Polka Greedy Priority Online Adaptive Polka Greedy Priority Online Adaptive Conclusion #1: Window CMs always improve throughput over Greedy and Priority Conclusion #2: Throughput is comparable to Polka (outperforms in Vacation)

Experimental Results (4) Aborts per commit ratio: ➢ No of txns aborted per txn commit ➢ Measures efficiency of a CM in utilizing computing resources SkipList Benchmark List Benchmark 0.16 20 18 0.14 16 0.12 14 No of aborts/commit No of aborts/commit 0.1 12 10 0.08 8 0.06 6 0.04 4 0.02 2 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 No of threads No of threads Polka Greedy Priority Online Adaptive Polka Greedy Priority Online Adaptive

Experimental Results (5) Aborts per commit ratio: RBTree Benchmark Vacation Benchmark 20 9 18 8 16 7 14 No of aborts/commit No of aborts/commit 6 12 5 10 4 8 3 6 2 4 1 2 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 No of threads No of threads Polka Greedy Priority Online Adaptive Polka Greedy Priority Online Adaptive Conclusion #3: Window CMs always reduce no of aborts over Greedy and Priority Conclusion #4: No of aborts are comparable to Polka (outperform in Vacation)

Experimental Results (6) Execution time overhead: ➢ Total time needed to commit all transactions ➢ Measures scalability of a CM in different contention scenarios SkipList Benchmark List Benchmark 2.5 25 Total execution time (in seconds) Total execution time (in seconds) 2 20 1.5 15 1 10 5 0.5 0 0 Low Medium High Low Medium High Amount of contention Amount of contention Polka Greedy Priority Online Adaptive Polka Greedy Priority Online Adaptive

Experimental Results (7) Execution time overhead: Vacation Benchmark RBTree Benchmark 8 20 18 7 Total execution time (in seconds) Total execution time (in seconds) 16 6 14 5 12 4 10 8 3 6 2 4 1 2 0 0 Low Medium High Low Medium High Amount of contention Amount of contention Polka Greedy Priority Online Adaptive Polka Greedy Priority Online Adaptive Conclusion #5: Window CMs generally reduce execution time over Greedy and Priority ( except SkipList) Conclusion #6: Window CMs good at high contention due to randomization overhead

Future Directions Encouraging theoretical and practical results • Plan to explore (experimental) • Wasted Work ➢ Repeat Conflicts ➢ Average Response Time ➢ Average committed transactions durations ➢ • Plan to do experiments using more complex benchmarks ➢ E.g., STAMP, STMBench7, and other STM implementations Plan to explore (theoretical) • Other contention managers with both theoretical and empirical ➢ guarantees

On the Performance of Window-Based Contention Managers for - PowerPoint PPT Presentation

On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University Agenda Introduction and Motivation Previous Studies and Limitations Execution Window Model

Optimizing Queries Using Window Functions Viceniu Ciorbaru Agenda What are window

Data-driven window width adaption adaption for robust for robust online moving window regression

Sliding Window Protocol Sliding window protocol: Stop & Wait: inefficient if a is

PROPOSAL FOR NEW EXTERIOR WINDOW DETAILS, AWNINGS & WINDOW BOXES 137 WEST 11TH STREET

A Farm in Every Window: A Study into the Incentives for Participation in the Window Farm Virtual

TCP Tahoe, Reno, NewReno, SACK, and Vegas cwnd : congestion window swnd : usable sending window

Output in Window Systems and Toolkits Window Systems v. GUI Toolkits GUI Toolkit: what goes on

Extraction of data/primitives inside a region of interest window => Discard (parts of )

CS 4204 Computer Graphics Window based Window based programming and GLUT programming and GLUT

New stained-glass window in the Narthex This packet presents information on a new stained-glass

One Window One Window Recommendation 1 Background: Housing Intake in Calgary Current State

WINDOW CRIMPS WITH SURE GRIP FASTER Install in only three quick steps. SAFER Stronger

Optimizing Queries Using CTEs and Window Functions Viceniu Ciorbaru Software Engineer @

The Sliding Window Algorithm The Sliding Window algorithm sums several small

Thailand National Single Window & ASEAN Single Window Welcome Government Officials from

A B C D E F G H I J 22 BRADDOCK PARK - STREET VIEW WINDOW LOCATION LEGEND WINDOWS

What well talk about 2 ZSim has a full-featured memory system (originally designed for

Low Contention Mapping of Real-Time Tasks onto a TilePro 64 Core Processor Christopher Zimmer and

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public

Analytical Performance Modeling of Hierarchical Interconnect Fabrics Nikita Nikitin, Javier de

Interference-aware Scheduling for Data-processing Frameworks in Container-based Clusters Miguel

Database Systems Do Not Scale to 1000 CPU Cores And Other Tales of the Macabre @ andy_pavlo 2

Exploration of Influence of Program Inputs on CMP Co-Scheduling Yunlian Jiang Xipeng Shen

L-Store: A Real-time OLTP and OLAP System Mohammad Sadoghi Souvik Bhattacharjee ,

On the Performance of Window-Based Contention Managers for - PowerPoint PPT Presentation

On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University Agenda Introduction and Motivation Previous Studies and Limitations Execution Window Model

Optimizing Queries Using Window Functions Viceniu Ciorbaru Agenda What are window

Data-driven window width adaption adaption for robust for robust online moving window regression

Sliding Window Protocol Sliding window protocol: Stop &amp; Wait: inefficient if a is

PROPOSAL FOR NEW EXTERIOR WINDOW DETAILS, AWNINGS &amp; WINDOW BOXES 137 WEST 11TH STREET

A Farm in Every Window: A Study into the Incentives for Participation in the Window Farm Virtual

TCP Tahoe, Reno, NewReno, SACK, and Vegas cwnd : congestion window swnd : usable sending window

Output in Window Systems and Toolkits Window Systems v. GUI Toolkits GUI Toolkit: what goes on

Extraction of data/primitives inside a region of interest window =&gt; Discard (parts of )

CS 4204 Computer Graphics Window based Window based programming and GLUT programming and GLUT

New stained-glass window in the Narthex This packet presents information on a new stained-glass

One Window One Window Recommendation 1 Background: Housing Intake in Calgary Current State

WINDOW CRIMPS WITH SURE GRIP FASTER Install in only three quick steps. SAFER Stronger

Optimizing Queries Using CTEs and Window Functions Viceniu Ciorbaru Software Engineer @

The Sliding Window Algorithm The Sliding Window algorithm sums several small

Thailand National Single Window &amp; ASEAN Single Window Welcome Government Officials from

A B C D E F G H I J 22 BRADDOCK PARK - STREET VIEW WINDOW LOCATION LEGEND WINDOWS

What well talk about 2 ZSim has a full-featured memory system (originally designed for

Low Contention Mapping of Real-Time Tasks onto a TilePro 64 Core Processor Christopher Zimmer and

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public

Analytical Performance Modeling of Hierarchical Interconnect Fabrics Nikita Nikitin, Javier de

Interference-aware Scheduling for Data-processing Frameworks in Container-based Clusters Miguel

Database Systems Do Not Scale to 1000 CPU Cores And Other Tales of the Macabre @ andy_pavlo 2

Exploration of Influence of Program Inputs on CMP Co-Scheduling Yunlian Jiang Xipeng Shen

L-Store: A Real-time OLTP and OLAP System Mohammad Sadoghi Souvik Bhattacharjee ,

Sliding Window Protocol Sliding window protocol: Stop & Wait: inefficient if a is

PROPOSAL FOR NEW EXTERIOR WINDOW DETAILS, AWNINGS & WINDOW BOXES 137 WEST 11TH STREET

Extraction of data/primitives inside a region of interest window => Discard (parts of )

Thailand National Single Window & ASEAN Single Window Welcome Government Officials from