Performance Evaluation of Adaptivity in STM Mathias Payer and - - PowerPoint PPT Presentation

performance evaluation of adaptivity in stm
SMART_READER_LITE
LIVE PREVIEW

Performance Evaluation of Adaptivity in STM Mathias Payer and - - PowerPoint PPT Presentation

Performance Evaluation of Adaptivity in STM Mathias Payer and Thomas R. Gross Department of Computer Science, ETH Zrich Motivation STM systems rely on many assumptions Often contradicting for different programs Statically tuned to


slide-1
SLIDE 1

Performance Evaluation of Adaptivity in STM

Mathias Payer and Thomas R. Gross Department of Computer Science, ETH Zürich

slide-2
SLIDE 2

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 2

Motivation

  • STM systems rely on many assumptions
  • Often contradicting for different programs
  • Statically tuned to a baseline
  • Use self-optimizing systems
  • Adapt to different workloads
  • What parameters can be adapted?
  • How to measure effectiveness?
slide-3
SLIDE 3

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 3

Outline

  • Introduction
  • STM System
  • STM Baseline
  • Adaptive Parameters
  • Evaluation
  • Related work
  • Conclusion
slide-4
SLIDE 4

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 4

Introduction

  • Software Transactional Memory (STM) applies

transactions to memory

  • (Optimistic) concurrency control mechanism
  • Alternative to lock-based synchronization
  • Multiple concurrent threads run transactions
  • Concurrent memory modifications
slide-5
SLIDE 5

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 5

Introduction

  • Concurrent transactions modify memory without

synchronization

  • Transaction is verified after completion
  • Conflicts are detected and resolved
  • Changes committed for conflict-free transactions
  • Modifications only visible after commit
slide-6
SLIDE 6

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 6

Introduction

withdraw { tmp = balance; tmp = tmp – 100 balance = tmp; } deposit { tmp = balance; tmp = tmp + 100 balance = tmp; }

  • What happens when balance is accessed

concurrently?

  • Either locking or STM needed to ensure correct end

balance

  • STM system decides which tx is executed first

TX starts balance in read-set balance in write-set Conflict detection, data committed

slide-7
SLIDE 7

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 7

STM Baseline

  • Many efficient STM implementations agree on

important design decisions:

  • Word-based locking
  • Global locking / version table
  • Eager locking
  • (Almost) no contention management
  • Simple write-set and read-set implementations
slide-8
SLIDE 8

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 8

STM Baseline

Combined global write lock / version array

Transaction

Lock list Write Hash Read Hash Write list / buffer Read list / buffer

Transaction

Lock list Write Hash

Read Hash

Write list / buffer Read list / buffer

slide-9
SLIDE 9

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 9

Adaptive STM Parameters

  • Global adaptivity
  • Synchronization needed
  • Optimizes to global optimum
  • Averages over all concurrent transactions
  • (Thread-) local adaptivity
  • No synchronization needed
  • Limits adaptable parameters
  • Best parameters for each thread/transaction
slide-10
SLIDE 10

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 10

Adaptive STM Parameters

  • Different adaptive parameters measured:
  • Size of global locking/version-table *G
  • Size of local hash-tables *L
  • Write strategy *L
  • Locality tuning for hash-functions *L
  • Contention management *L

*L – local, *G – global

slide-11
SLIDE 11

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 11

Adaptive Hash-Table

  • Global hash-table: trade-off between over-

locking and locality

  • Global strategy: coordinate lock collisions and over-

locking between threads

  • Adapt size based on global information
  • Local hash-table: trade-off between reset cost,

and # hash-collisions

  • Local strategy: sample moving average of unique

write locations

  • Adapt size based on trend
slide-12
SLIDE 12

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 12

Adaptive Write Strategy

  • Different costs depending on strategy
  • Write-back: cheap abort, expensive commit
  • Write-through: expensive abort, cheap commit
  • Adapt strategy to per-thread workload
  • Measure abort rate
slide-13
SLIDE 13

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 13

Adaptive Locality Tuning

  • Different applications have different data

access patterns

  • No optimal hash function for all data accesses
  • Measure number of hash collisions for thread-

local hash tables

  • Circle through different hash functions
slide-14
SLIDE 14

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 14

Adaptive Contention Management

  • No single strategy works in all environments
  • Measure contention and implement an adaptive

back-off strategy

  • Wait and retry
  • Abort later
slide-15
SLIDE 15

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 15

Local Adaptive STM Parameters (for local hash-table)

enlarge write-hash shrink write-hash no change # writes vs. hash-table space

slide-16
SLIDE 16

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 16

Local Adaptive STM Parameters (for local hash-table)

# hash collisions change hash-function no change

slide-17
SLIDE 17

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 17

Local Adaptive STM Parameters (for local hash-table)

# hash collisions change hash-function enlarge write-hash shrink write-hash no change # writes vs. hash-table space enlarge write-hash & change hash-function shrink write-hash & change hash-function

slide-18
SLIDE 18

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 18

AdaptSTM

  • Adaptive STM system built on presented

features

  • Statically tuned competitive baseline

– Static global hash function and hash table

  • Mature and stable implementation
  • Different local adaptive parameters

– Write-set hash function and size of hash table – Write-through and write-back write strategy – Adaptive contention management

slide-19
SLIDE 19

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 19

Evaluation

  • Benchmark: STAMP 0.9.10
  • ++ configuration (increased workload for kmeans)
  • AdaptSTM version 0.5.1
  • Intel 4-core Xeon E5520 CPU
  • 8 cores @ 2.27GHz, 12GB RAM
  • 64bit Ubuntu 9.04
slide-20
SLIDE 20

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 20

Evaluation: Global Hash-Table

2 4 6 8 10

0.5 1 1.5 2 2.5 3 3.5 4 4.5

Genome

4 Threads

2^16 2^18 2^20 2^22 2^24 2^26

# Shifts Time [s]

2 4 6 8 10

10 20 30 40 50 60 70 80

kmeans

4 Threads

2^16 2^18 2^20 2^22 2^24 2^26

# Shifts Time [s]

slide-21
SLIDE 21

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 21

Evaluation: Global Adaptivity

  • Global optimizations have limited potential
  • Small optimization potential
  • High synchronization cost
  • Reasonable baseline outperforms global
  • ptimization
slide-22
SLIDE 22

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 22

Evaluation: Local Adaptivity

  • Different configurations:
  • naWB: no adaptivity, use write-back
  • aWBT: adaptivity, adjust write-through / write-back
  • aWWH: aWBT plus an adaptive hash-table for the

write-set

  • aWHH: aWWH plus different hash functions
  • aALL: all adaptive parameters plus Bloom filter for

write-entries

  • Adaptation system starts with best 'average'

parameters, improves from there

slide-23
SLIDE 23

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 23

Evaluation: Local Adaptivity

  • aWBT: adaptive, write-back/-through
  • aWWH: adaptive, write-back/-through, write-hash
  • aWHH: adaptive, write-back/-through, write-hash, hash-function
  • aALL: adaptive, write-back/-through, write-hash, hash-function, Bloom filter

1 2 4 8 16

  • 15.00%
  • 10.00%
  • 5.00%

0.00% 5.00% 10.00% 15.00%

kmeans

aWBT aWWH aWHH aALL

Threads Speedup to non adaptive

1 2 4 8 16

  • 4.00%
  • 3.00%
  • 2.00%
  • 1.00%

0.00% 1.00% 2.00% 3.00%

Labyrinth

aWBT aWWH aWHH aALL

Threads Speedup to non adaptive

slide-24
SLIDE 24

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 24

Evaluation: Local Adaptivity

1 2 4 8 16

  • 3.00%
  • 2.00%
  • 1.00%

0.00% 1.00% 2.00% 3.00% 4.00% 5.00% 6.00%

Genome

aWBT aWWH aWHH aALL

Threads Speedup to non adaptive

1 2 4 8 16

  • 2.00%
  • 1.00%

0.00% 1.00% 2.00% 3.00% 4.00% 5.00%

Vacation

aWBT aWWH aWHH aALL

Threads Speedup to non adaptive

  • aWBT: adaptive, write-back/-through
  • aWWH: adaptive, write-back/-through, write-hash
  • aWHH: adaptive, write-back/-through, write-hash, hash-function
  • aALL: adaptive, write-back/-through, write-hash, hash-function, Bloom filter
slide-25
SLIDE 25

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 25

Evaluation: Local Adaptivity

  • No single optimization works for all benchmarks
  • Combination of all options leads to best

performance

  • Impressive speed-ups for individual

benchmarks compared to the globally optimized case

slide-26
SLIDE 26

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 26

Related Work

  • TL2 (Dice et al.): baseline STM system
  • Different related work on static tuning of global

parameters (Harris, Dice, Ennals, Felber)

  • Crucial for efficient baseline
  • TinySTM (Felber et al.): adapts size and hash

function of global locking table

  • ASTM (Marathe et. al.): adapts lazy-eager

locking strategies and different meta-formats

slide-27
SLIDE 27

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 27

Conclusions

  • Adaptivity in STM is important for good

performance

  • Speedups up to 10% possible
  • Global optimization are limited
  • Low potential, high synchronization cost
  • Local optimizations tune thread-local

parameters

  • High correlation with workload
slide-28
SLIDE 28

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 28

Questions

  • Contact: mathias.payer@nebelwelt.net
  • Source: http://nebelwelt.net/projects/adaptSTM/

?

slide-29
SLIDE 29

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 29

Evaluation: Global Hash-Table

2 4 6 8 10 5 10 15 20 25 30

Bayes

4 Threads

# Shifts Time [s]

2 4 6 8 10 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Genome

4 Threads

2^16 2^18 2^20 2^22 2^24 2^26

# Shifts Time [s]

2 4 6 8 10 5 10 15 20 25 30

Vacation

4 Threads

# Shifts Time [s]

2 4 6 8 10 10 20 30 40 50 60 70 80

kmeans

4 Threads

2^16 2^18 2^20 2^22 2^24 2^26

# Shifts Time [s]

slide-30
SLIDE 30

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 30

Evaluation: Global Hash-Table

2 4 6 8 10 5 10 15 20 25

Labyrinth

4 Threads

2^16 2^18 2^20 2^22 2^24 2^26

# Shifts Time [s]

2 4 6 8 10 2 4 6 8 10 12 14 16 18 20

Intruder

4 Threads

2^16 2^18 2^20 2^22 2^24 2^26

# Shifts Time [s]

2 4 6 8 10 2 4 6 8 10 12 14 16 18

SSCA2

4 Threads

2^16 2^18 2^20 2^22 2^24 2^26

# Shifts Time [s]

2 4 6 8 10 5 10 15 20 25 30 35 40 45 50

YADA

4 Threads

2^16 2^18 2^20 2^22 2^24 2^26

# Shifts Time [s]

slide-31
SLIDE 31

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 31

STM Comparison

1 2 4 8 16 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

Genome

astm tl2 tstm tstm099

Threads Relative runtime

1 2 4 8 16 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Vacation

astm tl2 tstm tstm099

Threads Relative runtime

1 2 4 8 16 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Labyrinth

astm tl2 tstm tstm099

Threads Relative runtime

1 2 4 8 16 1 2 3 4 5 6

Intruder

astm tl2 tstm tstm099

Threads Relative runtime

slide-32
SLIDE 32

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 32

Evaluation: Local Adaptivity

1 2 4 8 16

  • 4.00%
  • 3.00%
  • 2.00%
  • 1.00%

0.00% 1.00% 2.00% 3.00%

Bayes

aWBT aWWH aWHH aALL

Threads Speedup to non adaptive

1 2 4 8 16

  • 3.00%
  • 2.00%
  • 1.00%

0.00% 1.00% 2.00% 3.00% 4.00% 5.00%

SSCA2

aWBT aWWH aWHH aALL

Threads Speedup to non adaptive

slide-33
SLIDE 33

ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 33

Evaluation: Local Adaptivity

1 2 4 8 16

  • 2.00%

0.00% 2.00% 4.00% 6.00% 8.00% 10.00% 12.00%

YADA

aWBT aWWH aWHH aALL

Threads Speedup to non adaptive