Transactional Execution of Java Programs Brian D. Carlstrom, - - PowerPoint PPT Presentation

transactional execution of java programs
SMART_READER_LITE
LIVE PREVIEW

Transactional Execution of Java Programs Brian D. Carlstrom, - - PowerPoint PPT Presentation

Transactional Execution of Java Programs Brian D. Carlstrom, JaeWoong Chung, Hassan Chafi, Austen McDonald Chi Cao Minh, Lance Hammond, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford University http://tcc.stanford.edu


slide-1
SLIDE 1

Transactional Execution of Java Programs Brian D. Carlstrom

Transactional Execution of Java Programs

Brian D. Carlstrom, JaeWoong Chung, Hassan Chafi, Austen McDonald Chi Cao Minh, Lance Hammond, Christos Kozyrakis, Kunle Olukotun

Computer Systems Laboratory Stanford University http://tcc.stanford.edu

slide-2
SLIDE 2

Transactional Execution of Java Programs 2

Transactional Execution of Java Programs

  • Goals

Run existing Java programs using transactional memory Require no new language constructs Require minimal changes to program source Compare performance of locks and transactions

  • Non-Goals

Create a new programming language Add new transactional extensions Run all Java programs correctly without modification

slide-3
SLIDE 3

Transactional Execution of Java Programs 3

TCC Transactional Memory

  • Continuous Transactional Architecture
  • “all transactions, all the time”
  • Transactional Coherency and Consistency (TCC)
  • Replaces MESI Snoopy Cache Coherence (SCC) protocol
  • At hardware level, two classes of transactions

1.

indivisible transactions for programmer defined atomicity

2.

divisible transactions for outside critical regions

  • Divisible transactions can be split if convenient
  • For example, when hardware buffers overflow
slide-4
SLIDE 4

Transactional Execution of Java Programs 4

Translating Java to Transactions

  • Three rules create transactions in Java programs

1. synchronized defines an indivisible transaction 2. volatile references define indivisible transactions 3. Object.wait performs a transaction commit

  • Allows us to run:
  • Histogram based on our ASPLOS 2004 paper
  • Benchmarks described in Harris and Fraser OOPSLA 2003
  • SPECjbb2000 benchmark
  • All of Java Grande (5 kernels and 3 applications)
  • Performance comparable or better in almost all cases
slide-5
SLIDE 5

Transactional Execution of Java Programs 5

Defining indivisible transactions

  • synchronized blocks define indivisible transactions

public static void main (String args[]){ a(); a(); // divisible transactions synchronized (x){ COMMIT(); b(); b(); // indivisible transaction } COMMIT(); c(); c(); // divisible transactions } COMMIT();

  • We use closed nesting for nested synchronized blocks

public static void main (String args[]){ a(); a(); // divisible transactions synchronized (x){ COMMIT(); b1(); b1(); // synchronized (y) { // b2(); b2(); // indivisible transaction } // b3(); b3(); // } COMMIT(); c(); c(); // divisible transactions } COMMIT();

slide-6
SLIDE 6

Transactional Execution of Java Programs 6

Coping with condition variables

  • In our execution, Object.wait commits the transaction
  • Why not rollback transaction on Object.wait?

This is the approach of Conditional Critical Regions (CCRs) as well as Harris’s retry keyword This does handle most common usage of condition variables

while (!condition) wait();

slide-7
SLIDE 7

Transactional Execution of Java Programs 7

Coping with condition variables

  • However, need Object.wait commit to run current code
  • Motivating example: A simple barrier implementation

synchronized (lock) { count++; if (count != thread_count) { lock.wait(); } else { count = 0; lock.notifyAll(); } }

Code like this is found in Sun Java Tutorial, SPECjbb2000, and Java Grande

  • With rollback, all threads think they are first to barrier
  • With commit, barrier works as intended
slide-8
SLIDE 8

Transactional Execution of Java Programs 8

Coping with condition variables

  • Nested transaction problem

We don’t want to commit value of “a” when we wait:

synchronized (x) { a = true; synchronized (y) { while (!b) y.wait(); c = true;}}

With locks, wait releases specific lock With transactions, wait commits all outstanding transactions In practice, nesting examples are very rare

  • It is bad to wait while holding a lock
  • wait and notify are usually used for unnested top level coordination
slide-9
SLIDE 9

Transactional Execution of Java Programs 9

Coping with condition variables

  • Not happy with unclean semantics

Most existing Java programs work correctly Unfortunately no guarantee

  • Fortunately, if you prefer rollback…

Barrier code example can be rewritten to use rollback Presumably this is generally true…

slide-10
SLIDE 10

Transactional Execution of Java Programs 10

Hardware and Software Environment

  • The simulated chip multiprocessor TCC Hardware (See PACT 2005)
  • JikesRVM

Derived from release version 2.3.4 Scheduler pinned threads to avoid context switching Garbage Collector disabled and 1GB heap used All necessary code precompiled before measurement Virtual machine startup excluded from measurement

16 bytes Bus width 8 entries fully associative Victim Cache 3 pipelined cycles Transfer Latency 100 cycles latency, up to 8 outstanding transfers Main Memory 8MB, 8-way, 16 cycles hit time L2 Cache 3 pipelined cycles Bus arbitration 64-KB, 32-byte cache line, 4-way associative, 1 cycle latency L1 1-16 single issue PowerPC core CPU

slide-11
SLIDE 11

Transactional Execution of Java Programs 11

Transactions remove lock overhead

  • SPECjbb2000 benchmark
  • Problem

Locking is used because of 1%

  • f operations than span two

warehouses Pay for lock overhead 100% of the time for 1% case.

  • Solution

Transactions make the common case fast, time lost to violations not even visible in this example.

10 20 30 40 50 60

Locks-2 Trans.-2 Locks-4 Trans.-4 Locks-8 Trans.-8 Locks-16 Trans.-16

Normalized Execution Time (%)

Busy Lock Violations

slide-12
SLIDE 12

Transactional Execution of Java Programs 12

Transactions keep data structures simple

  • TestHashtable

mix of read/writes to Map

  • Problem

Java has 3 basic Map classes Which to choose?

  • HashMap

– No synchronization

  • Hashtable

– Singe coarse lock

  • ConcurrentHashMap

– Fine grained locking

  • Solution

ConcurrentHashMap scales but has single CPU overhead With transactions, just use HashMap and scale like CHM

2 4 6 8 10 12 1 2 4 8 16 CPUs Speedup

HashMap Hashtable ConcurrentHashMap Transactional HashMap

slide-13
SLIDE 13

Transactional Execution of Java Programs 13

Transactions can scale better with contention

  • Low Contention

Transactions have slight edge without lock overhead

  • High Contention

CHM scales to 4 but then slows Transactions scale to 16 cpus

5 10 15 20 25 30 35 40 45 CHM Fine-2

  • Trans. HM-2

CHM Fine-4

  • Trans. HM-4

CHM Fine-8

  • Trans. HM-8

CHM Fine-16

  • Trans. HM-16

CHM Fine-2

  • Trans. HM-2

CHM Fine-4

  • Trans. HM-4

CHM Fine-8

  • Trans. HM-8

CHM Fine-16

  • Trans. HM-16

low contention high contention Normalized Execution Time (%)

Violations Lock Busy

  • TestCompound

Atomic swap of Map elements (low and high contention experiments) Extra lock overhead compared to TestHashtable to lock keys

slide-14
SLIDE 14

Transactional Execution of Java Programs 14

Java Grande Applications: MolDyn

  • MolDyn

Time spent on locks close to time lost to violations Both scale to 8 CPUs and slow at 16 CPUs

10 20 30 40 50 60 70

Locks-2 Trans.-2 Locks-4 Trans.-4 Locks-8 Trans.-8 Locks-16 Trans.-16

Normalized Execution Time (%)

Busy Lock Violations

slide-15
SLIDE 15

Transactional Execution of Java Programs 15

Java Grande Applications: MonteCarlo

  • MonteCarlo

Similar to SPECjbb2000 (and Histogram in paper) Performance difference attributable to lock overhead Both scale to 16 CPUs

10 20 30 40 50 60

Locks-2 Trans.-2 Locks-4 Trans.-4 Locks-8 Trans.-8 Locks-16 Trans.-16

Normalized Execution Time (%)

Busy Lock Violations

slide-16
SLIDE 16

Transactional Execution of Java Programs 16

Java Grande Applications: RayTracer

  • RayTracer

Another contention example

  • 2 CPUs

Lock and Violation time approximately equal Difference in Busy time attributable to commit overhead (see paper graph)

  • 4 CPUs

Overall time about equal Lock time as percentage of

  • verall time has increased
  • 8 CPUs

Transactions pull ahead as Lock percentage increases

  • 16 CPUs

Transactions still ahead as Lock and Violation percentage grows

10 20 30 40 50 60 70

Locks-2 Trans.-2 Locks-4 Trans.-4 Locks-8 Trans.-8 Locks-16 Trans.-16

Normalized Execution Time (%)

Busy Lock Violations

slide-17
SLIDE 17

Transactional Execution of Java Programs 17

Transactional Execution of Java Programs

  • Goals (revisited)

Run existing Java programs using transactional memory

  • Can run a wide variety of existing benchmarks

Require no new language constructs

  • Used existing synchronized, volatile, and Object.wait

Require minimal changes to program source

  • No changes required for these programs

Compare performance of locks and transactions

  • Generally better performance from transactions