Window-Based Greedy Contention Management for Transactional Memory - - PowerPoint PPT Presentation

window based greedy contention
SMART_READER_LITE
LIVE PREVIEW

Window-Based Greedy Contention Management for Transactional Memory - - PowerPoint PPT Presentation

Window-Based Greedy Contention Management for Transactional Memory Gokarna Sharma ( LSU ) Brett Estrade ( Univ. of Houston ) Costas Busch ( LSU ) 1 DISC 2010 - 24th International Symposium on Distributed Computing Transactional Memory -


slide-1
SLIDE 1

Window-Based Greedy Contention Management for Transactional Memory

Gokarna Sharma (LSU) Brett Estrade (Univ. of Houston) Costas Busch (LSU)

1 DISC 2010 - 24th International Symposium on Distributed Computing

slide-2
SLIDE 2

Transactional Memory - Background

  • The emergence of multi-core architectures

– Opportunities and challenges

  • How to handle access to shared data?

– Locks, Monitors, …

  • Transactional memory (TM) is an alternative

synchronization abstraction

– Simple, composable, …

  • Three types – Hardware, Software, and Hybrid TMs

– Our focus is on STM Systems

DISC 2010 - 24th International Symposium on Distributed Computing 2

slide-3
SLIDE 3

STM Systems

  • Progress is ensured through contention management (CM)

policy

  • If transactions modify different data

– everything is OK

  • If transactions modify same data

– conflicts arise that must be resolved - job of a contention

management policy

  • Of particular interest are greedy contention managers

– Transactions immediately restart after every abort

3 DISC 2010 - 24th International Symposium on Distributed Computing

slide-4
SLIDE 4

Prior Work

  • Mostly empirical evaluation
  • Theoretical Analysis

– [Guerraoui et al., PODC’05]

  • Greedy Contention Manager
  • Competitive ratio = O(s2) (s is the number of shared resources)

– [Attiya et al., PODC’06]

  • Improved to O(s)

– [Schneider & Wattenhofer, ISAAC’09]

  • RandomizedRounds Contention Manager
  • Competitive ratio = O(C . log n) (C is the maximum number of conflicting

transactions and n is the number of transactions)

– [Attiya & Milani, OPODIS’09]

  • Bimodal Scheduler
  • Competitive ratio = O(s) (for bimodal workload with equi-length transactions)

4 DISC 2010 - 24th International Symposium on Distributed Computing

slide-5
SLIDE 5

Our Contributions

  • Execution window model for TM
  • Makespan bound of any CM algorithm based on the contention

measure C with in the window and the window parameters M and N

  • Two new randomized contention management algorithms that

are very close to O(s)-competitive

  • An adaptive version that adapts to the amount of contention C

5

1 2 3 N

N M

1 2 3 M

Transactions

. . .

Threads . . .

DISC 2010 - 24th International Symposium on Distributed Computing

slide-6
SLIDE 6

Roadmap

  • Previous TM models and problem complexity
  • Our TM model
  • Our algorithms and proof ideas

6 DISC 2010 - 24th International Symposium on Distributed Computing

slide-7
SLIDE 7

Previous TM Models

  • One-shot scheduling problem

– n transactions, a single transaction per thread – Best bound proven to be achievable is O(s)

  • Problem Complexity: directly related to vertex coloring

– Coloring problem -> One-shot scheduling problem -> One-shot scheduling Solution -> Coloring Solution

  • NP-Hard to approximate an optimal vertex coloring
  • Can we do better under the limitations of coloring

reduction?

7 DISC 2010 - 24th International Symposium on Distributed Computing

slide-8
SLIDE 8

Execution Window Model

  • A M × N window W

– M threads with a sequence of N transactions per thread, i.e., collection of N one-shot transaction sets

8

1 2 3 N N M

1 2 3 M Transactions

. . . . . .

Threads

DISC 2010 - 24th International Symposium on Distributed Computing

slide-9
SLIDE 9

Makespan Bounds

  • Let C denote the maximum number of conflicting transactions for any

transaction inside the window

  • Trivial Makespan Bounds:

– Straightforward upper bound: τ . min(CN,MN), where τ is the execution time duration – One-shot analysis bound [Attiya et al., PODC’06]: O(sN) – Using RandomizedRounds [Schneider & Wattenhofer, ISAAC’09] N times, makespan bound: O(τ . CN logM)

  • Our Bounds:

– Offline-Greedy: Makespan bound = O(τ . (C + N log(MN))) and

Competitive Ratio = O(s + log(MN)) with high probability

– Online-Greedy: Makespan bound = O(τ . (C log(MN) + N log2(MN))) and

Competitive Ratio = O(s . log(MN) + log2(MN)) high probability

9 DISC 2010 - 24th International Symposium on Distributed Computing

slide-10
SLIDE 10

Intuition

  • The random delays help conflicting transactions shift inside the window and their

execution time may not coincide

  • More apparent in scenarios where conflicts are more frequent inside the same column

transactions and less frequent in different column transactions

10

N N’

Random interval

1 2 3 N M 1 2 3 N N M . . .

DISC 2010 - 24th International Symposium on Distributed Computing

slide-11
SLIDE 11

How it works? (1/2)

  • Random intervals: Assume each thread Pi knows Ciand

each transaction has same duration τ (this assumption can be

removed)

  • Conflicts: Divide time steps into frames [each time step is of size τ]

– Frame size depends on the conflict resolution strategy of the algorithm

  • Number of frames in random intervals: Each thread

chooses a random number qi independently, uniformly, and randomly from the range [0, αi -1], where αi = Ci/ log(MN)

  • Handling conflicts: Use priorities

11 DISC 2010 - 24th International Symposium on Distributed Computing

slide-12
SLIDE 12

How it works? (2/2)

12

1 2 3 N M N q1 ϵ [0, α1 -1], α1 = C1 / log(MN)

Frames C=maxi Ci, 1 ≤ i ≤ M

F11 F3N

Thread 1 Thread 2 Thread 3 Thread M

F1N F12

Makespan = (C / log(MN) + Number of frames) × Frame Size = (C / log(MN) + N) × Frame Size First frame of Thread 1 where T11 executes Second frame of Thread 1 where T12 executes

DISC 2010 - 24th International Symposium on Distributed Computing

slide-13
SLIDE 13

Offline-Greedy Algorithm (1/2)

  • Initialization:

– Frames are of size Φ = Θ(τ . ln(MN)) time steps – Each thread Pi is assigned initially a random period of qi ϵ [0, αi-1] frames, αi = Ci / log(MN) – Each transaction Tij is assigned to frame Fij = qi + (j-1)

  • Priority assignment: each transaction has two priorities: low or high

– Transaction Tij is initially in low priority – Tij switches to high priority in the first time step of frame Fij and remains in high priority thereafter

  • Conflict resolution: uses conflict graph explicitly to resolve conflicts

– Conflict graph is dynamic and evolves while the execution of the transactions progresses

13 DISC 2010 - 24th International Symposium on Distributed Computing

slide-14
SLIDE 14

Offline-Greedy Algorithm (2/2)

  • Proof Intuition: With high probability each transaction commits

in its assigned frame

– Let A’ ⊆ A denote the subset of conflicting transactions with Tij in frame Fij

  • |A’| ≤ log(MN) – 1, then Tij commits in frame Fij
  • |A’| ≥ log(MN) with probability at most (1/MN)

2

  • Makespan: O(𝜐 ⋅ (C + N log(MN))) with high probability

– Pro: For C ≤ N log(MN) makespan is log(MN) factor far from optimal, since N is a trivial lower bound – Con: Need to know dependency graph to resolve conflicts

  • Competitive ratio: O(s + log(MN)) with high probability

– Pro: Independent with any choice of C

14 DISC 2010 - 24th International Symposium on Distributed Computing

slide-15
SLIDE 15

Online-Greedy Algorithm (1/2)

  • Online in the sense that it does not depend on knowing the

dependency graph to resolve conflicts

  • Similar to Offline-Greedy except the conflict resolution strategy
  • Priority assignment

– Two different priorities associated with each transaction as a vector π(1), π(2) – π(1) represent the Boolean priority as in Offline-Greedy – π(2) ∈ [1, M] represent random priorities: A transaction chooses π(2) uniformly at random on the start of frame Fij and after every abort [Idea

from Schneider & Wattenhofer, ISAAC’09]

  • Conflict resolution

– On conflict of Tij with Tkl: if πij

(2) < πkl (2) then abort(Tij, Tkl) otherwise

abort(Tkl, Tij)

15 DISC 2010 - 24th International Symposium on Distributed Computing

slide-16
SLIDE 16

Online-Greedy Algorithm (2/2)

  • Proof Intuition: frame duration is now Φ’ = O(𝜐 ⋅ log2(MN))

– Analysis is similar to Offline-Greedy

  • Makespan: O(𝜐 ⋅ (C log(MN) + N log2(MN))) with high

probability

– Pro: no need to know dependency graph to resolve conflicts – Con: makespan is worse in comparison to Offline-Greedy

  • Competitive ratio: O(s ⋅ log(MN) + log

2(MN)) with high

probability

  • Pro: Independent of the contention measure C

16 DISC 2010 - 24th International Symposium on Distributed Computing

slide-17
SLIDE 17

Adaptive-Greedy Algorithm

  • Limitations of Offline-Greedy and Online-Greedy

algorithms

– The values of Ci need to be known in advance

  • Adaptive-Greedy: each thread starts with guessing Ci =

1

– Similar to the exponential back-off strategy used by Polka – Based on current Ci estimate, the thread attempts to execute Online-Greedy algorithm – If a thread Pi is unable to commit transactions (bad event) then Pi assumes choice of Ci is incorrect and starts over again by assuming Ci’ = 2 ⋅ Ci for remaining transactions

  • Correct choice of Ci is reached in logCi iterations

17 DISC 2010 - 24th International Symposium on Distributed Computing

slide-18
SLIDE 18

Discussions

  • For variable length transactions

– 𝜐 on makespan bounds is replaced with 𝜐max, which is the maximum duration of any transaction in the window – 𝜐max / 𝜐min factor in competitive ratio bounds, where 𝜐min is the minimum duration of any transaction in the window

  • Future extensions

– Instead of one randomization interval at the beginning of window, random periods of low priority between subsequent transactions – Dynamic expansion and contraction of the execution window to preserve the contention measure C

18 DISC 2010 - 24th International Symposium on Distributed Computing

slide-19
SLIDE 19

Conclusions

  • Execution window model for TM
  • Two new randomized greedy CM algorithms that are very

close to O(s)-competitive

  • Adaptive version of the previous algorithms for better

performance by avoiding the limitations of the known value of C

19 DISC 2010 - 24th International Symposium on Distributed Computing