Evaluating the Impact of Transactional Characteristics on the - - PowerPoint PPT Presentation
Evaluating the Impact of Transactional Characteristics on the - - PowerPoint PPT Presentation
Introduction Methodology Performance Evaluation Conclusions References Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications 1 Fernando Rui, 2 Mrcio Castro, 1 Dalvan Griebler, 1 Luiz
Introduction Methodology Performance Evaluation Conclusions References
Summary
1
Introduction
2
Methodology
3
Performance Evaluation
4
Conclusions
5
References
2 / 16
Introduction Methodology Performance Evaluation Conclusions References
Introduction
1
Motivation
Multi-core Applications are not embarrassingly parallel Traditional synchronization structures (locks, mutexes and semaphores)
Low-level mechanisms Cause Blocking Hard to manage Vulnerable to failures and faults
3 / 16
Introduction Methodology Performance Evaluation Conclusions References
Introduction
1
Transactional Memory (TM)
High-level abstraction Allows to write parallel code as transactions In runtime detect conflicts and solve them
4 / 16
Introduction Methodology Performance Evaluation Conclusions References
Introduction
1
Challenge of TM systems
What kind of applications can really take advantage of TM? Why some TM applications present low performance?
2
Contributions of this research
Performance evaluation of the state-of-art STM systems and applications Extend the analysis of [1], including the RSTM [2] system We find out characteristics that affect the performance TM We identify bottlenecks of TM App. that limit their scalability We show possible improvements to achieve better performance.
5 / 16
Introduction Methodology Performance Evaluation Conclusions References
Methodology
1
Comparative Analysis
1
Four state-of-the-art STM systems using the Stanford Transactional Applications for Multi-Processing (STAMP) benchmark [3];
2
Evaluation of STM systems using EigenBench [1];
3
We evaluate the impact of certain transactional characteristics using EigenBench.
2
Environment of Tests
All experiments were performed on a Dell PowerEdge R610 machine with two quad-core Intel Xeon E5520 2.27 GHz processors with 8MB of L2 cache and 16GB of shared memory; All results are arithmetic means of at least 30 runs to guarantee a confidence level of 95%.
6 / 16
Introduction Methodology Performance Evaluation Conclusions References
STM Systems Using STAMP Benchmark
1
STM Systems
Transactional Locking (TL2) [4]: second version of the
- riginal TL;
TinySTM [5]: uses shared counter as clock to control the conflicts between transactions and locks to protect shared memory locations; SwissTM [6]: its innovations is the hybrid conflict detection scheme; Rochester Software Transactional Memory (RSTM) [2]: reduces cache misses by employing a single level of indirection to access shared objects.
7 / 16
Introduction Methodology Performance Evaluation Conclusions References
STM Systems Using STAMP Benchmark
1
Performance Evaluation
1 2 3 4 5 b a y e s g e n
- m
e i n t r u d e r k m e a n s l a b y r i n t h s s c a 2 v a c a t i
- n
y a d a
SwissTM
Speedups 1 2 3 4 5 b a y e s g e n
- m
e i n t r u d e r k m e a n s l a b y r i n t h s s c a 2 v a c a t i
- n
y a d a
RSTM
1 2 3 4 5 b a y e s g e n
- m
e i n t r u d e r k m e a n s l a b y r i n t h s s c a 2 v a c a t i
- n
y a d a
TinySTM
1 2 3 4 5 b a y e s g e n
- m
e i n t r u d e r k m e a n s l a b y r i n t h s s c a 2 v a c a t i
- n
y a d a
TL2
Applications k m e l a v a 2 cores 4 cores 8 cores Legend
8 / 16
Introduction Methodology Performance Evaluation Conclusions References
SwissTM vs. RSTM using EigenBench
1
Set-up:
STM systems which presented better performance; STAMP applications with poor (ssca2), medium (intruder and vacation) and good (labyrinth and genome) scalability; The evaluation is based on speedup and aborts per commit (ApC).
2
EigenBench Input Parameters
Table: Applications characteristics from STAMP benchmark
Characteristic ssca2 intruder vacation labyrinth genome Working-set Size 400 MB 20 MB 256 MB 16 MB 20 MB Transactional Lenght 3 24 226 357 88 Pollution 33% 5% 2% 50% 5% Temporal Locality 0.33 0.52 0.59 0.77 0.58 Contention 0.0005% 22% 0.2% 5% 0.5% Predominance Low Low High Low High Density High High High Low High
9 / 16
Introduction Methodology Performance Evaluation Conclusions References
SwissTM vs. RSTM using EigenBenach (Cont.)
1
Performance Evaluation
k m e l a v a 2 cores 4 cores 8 cores Legend 0% 2% 4% 6% 8% 10% 12% 14% 16% 2 4 8 Number of cores
Aborts per commit
1 2 3 4 5 6 7 8 genome intruder labyrinth ssca2 vacation Applications
Speedups
SwissTM RSTM 1 2 3 4 5 6 7 8 genome intruder labyrinth ssca2 vacation Applications
Speedups
0% 1% 2% 3% 4% 5% 6% 2 4 8 Number of cores
Aborts per commit
genome intruder labyrinth ssca2 vacation Legend
10 / 16
Introduction Methodology Performance Evaluation Conclusions References
SwissTM vs. RSTM using EigenBenach (Cont.)
1
Findings
TM applications that use large amounts of memory did not present good performance, since STM systems need to keep track of much more data to detect conflicts; The variation in terms of transaction lengths during the execution is not well treated by most of the STM systems; Low degrees of predominance and density help TM applications to perform better; High levels of ApC generally limit the performance of TM applications.
11 / 16
Introduction Methodology Performance Evaluation Conclusions References
Evaluating the Impact of Transactional Characteristics
1 2 3 4 5 Original V1 V2 V3 V4
Genome - Transactional Length
1 2 3 4 5 Original V1 V2 V3 V4
Intruder - Temporal Locality
1 2 3 4 5 Original V1 V2 V3 V4
Ssca2 - Working-set Size
1 2 3 4 5 Original V1 V2 V3 V4
Vacation - Working-set Size
Speedups Versions k m e l a v a 2 cores 4 cores 8 cores Legend
12 / 16
Introduction Methodology Performance Evaluation Conclusions References
Conclusions About this paper
Some Characteristics drive the performance of TM applications; Applications must be analysed carefully to identify relevant characteristics;
Future Opportunities
We intend to extend this work using some tracing mechanisms as proposed in [7]; We intend to study the impact of the TM characteristics on the performance of TM applications when executed on a real HTM processor such as the Intel Haswell.
13 / 16
Introduction Methodology Performance Evaluation Conclusions References
References I
Sungpack Hong et al. Eigenbench: A Simple Exploration Tool for Orthogonal TM Characteristics. In IEEE International Symposium on Workload Characterization (IISWC), pages 1–11, Washington, USA,
- 2010. IEEE Computer Society.