Atomicity via Source-to-Source Translation Benjamin Hindman Dan - - PowerPoint PPT Presentation

atomicity via source to source translation
SMART_READER_LITE
LIVE PREVIEW

Atomicity via Source-to-Source Translation Benjamin Hindman Dan - - PowerPoint PPT Presentation

Atomicity via Source-to-Source Translation Benjamin Hindman Dan Grossman University of Washington 22 October 2006 Atomic An easier-to-use and harder-to-implement primitive void deposit(int x){ void deposit(int x){ synchronized(this){


slide-1
SLIDE 1

Atomicity via Source-to-Source Translation

Benjamin Hindman Dan Grossman University of Washington

22 October 2006

slide-2
SLIDE 2

Atomic

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 2

An easier-to-use and harder-to-implement primitive void deposit(int x){ synchronized(this){ int tmp = balance; tmp += x; balance = tmp; }} void deposit(int x){ atomic { int tmp = balance; tmp += x; balance = tmp; }} lock acquire/release (behave as if) no interleaved computation

slide-3
SLIDE 3

Why the excitement?

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 3

  • Software engineering

– No brittle object-to-lock mapping – Composability without deadlock – Simply easier to use

  • Performance

– Parallelism unless there are dynamic memory conflicts But how to implement it efficiently…

slide-4
SLIDE 4

This Work

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 4

Unique approach to “Java + atomic”

  • 1. Source-to-source compiler (then use any JVM)
  • 2. Ownership-based (no STM/HTM)

– Update-in-place, rollback-on-abort – Threads retain ownership until contention

  • 3. Support “strong” atomicity

– Detect conflicts with non-transactional code – Static optimization helps reduce cost

slide-5
SLIDE 5

Outline

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 5

  • Basic approach
  • Strong vs. weak atomicity
  • Benchmark evaluation
  • Lessons learned
  • Conclusion
slide-6
SLIDE 6

System Architecture

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 6

Our “run-time” … … javac AThread. java AThread. java Our compiler Polyglot foo.ajava foo.ajava Note: Separate compilation or

  • ptimization

class files

slide-7
SLIDE 7

Key pieces

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 7

  • A field read/write first acquires ownership of object

– In transaction, a write also logs the old value – No synchronization if already own object

  • Some Java cleverness for efficient logging
  • Polling for releasing ownership

– Transactions rollback before releasing

  • Lots of omitted details for other Java features
slide-8
SLIDE 8

Acquiring ownership

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 8

All objects have an owner field class AObject extends Object { Thread owner; //who owns the object void acq(){…} //owner=caller (blocking) } Field accesses become method calls

  • Read/write barriers that acquire ownership
  • Calls simplify/centralize code (JIT will inline)
slide-9
SLIDE 9

Field accessors

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 9

D x; // field in class C static D get_x(C o){

  • .acq(); return o.x;

} static D set_nonatomic_x(C o, D v) {

  • .acq(); return o.x = v;

} static D set_atomic_x(C o, D v) {

  • .acq();

((AThread)currentThread()).log(…); return o.x = v; } Note: Two versions of each application method, so know which version of setter to call

slide-10
SLIDE 10

Important fast-path

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 10

If thread already owns an object, no synchronization

  • Does not require sequential consistency
  • With “owner=currentThread()” in constructor, thread-

local objects never incur synchronization Else add object to owner’s “to release” set and wait – Synchronization on owner field and “to release” set – Also fanciness if owner is dead or blocked void acq(){ if(owner==currentThread()) return; … }

slide-11
SLIDE 11

Logging

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 11

  • Conceptually, the log is a stack of triples

– Object, “field”, previous value – On rollback, do assignments in LIFO order

  • Actually use 3 coordinated arrays
  • For “field” we use singleton-object Java trickery:

D x; // field in class C static Undoer undo_x = new Undoer() { void undo(Object o, Object v) { ((C)o).x = (D)v; } } …currentThread().log(o, undo_x, o.x);…

slide-12
SLIDE 12

Releasing ownership

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 12

  • Must “periodically” check “to release” set

– If in transaction, first rollback

  • Retry later (after backoff to avoid livelock)

– Set owners to null

  • Source-level “periodically”

– Insert call to check() on loops and non-leaf calls – Trade-off synchronization and responsiveness: int count = 1000; //thread-local void check(){ if(--count >= 0) return; count=1000; really_check(); }

slide-13
SLIDE 13

But what about…?

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 13

Modern, safe languages are big See paper & tech. report for: constructors, primitive types, static fields, class initializers, arrays, native calls, exceptions, condition variables, library classes, …

slide-14
SLIDE 14

Outline

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 14

  • Basic approach
  • Strong vs. weak atomicity
  • Benchmark evaluation
  • Lessons learned
  • Conclusion
slide-15
SLIDE 15

Strong vs. weak

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 15

  • Strong: atomic not interleaved with any other code
  • Weak: semantics less clear

– “If atomic races with non-atomic code, undefined”

  • Okay for C++, non-starter for safe languages

– Atomic and non-atomic code can be interleaved

  • For us, remove read/write barriers outside

transactions

  • One common view: strong what you want, but too

expensive in software – Present work offers (only) a glimmer of hope

slide-16
SLIDE 16

Examples

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 16

atomic { x=null; if(x!=null) x.f=42; } atomic { print(x); x=secret_password; //compute with x x=null; }

slide-17
SLIDE 17

Optimization

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 17

Static analysis can remove barriers outside transactions

  • In the limit, “strong for the price of weak”

Thread local Immutable Not used in atomic

  • This work: Type-based alias information
  • Ongoing work: Using real points-to information
slide-18
SLIDE 18

Outline

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 18

  • Basic approach
  • Strong vs. weak atomicity
  • Benchmark evaluation
  • Lessons learned
  • Conclusion
slide-19
SLIDE 19

Methodology

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 19

  • Changed small programs to use atomic

(manually checking it made sense) – 3 modes: “weak”, “strong-opt”, “strong-noopt” – And original code compiled by javac: “lock”

  • All programs take variable number of threads

– Today: 8 threads on an 8-way Xeon with the Hotswap JVM, lots of memory, etc. – More results and microbenchmarks in the paper

  • Report slowdown relative to lock-version and

speedup relative to 1 thread for same-mode

slide-20
SLIDE 20

A microbenchmark

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 20

crypt: – Embarrassingly parallel array processing – No synchronization (just a main Thread.join)

lock weak strong-opt strong-noopt slowdown vs. lock

  • 1.1x

1.1x 15.0x speedup vs. 1 thread 5x 5x 5x 0.7x

  • Overhead 10% without read/write barriers

– No synchronization (just a main Thread.join)

  • Strong-noopt a false-sharing problem on the array

– Word-based ownership often important

slide-21
SLIDE 21

TSP

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 21

A small clever search procedure with irregular contention and benign purposeful data races – Optimizing strong cannot get to weak

lock weak strong-opt strong-noopt slowdown vs. lock

  • 2x

11x 21x speedup vs. 1 thread 4.5x 2.8x 1.4x 1.4x

Plusses:

  • Simple optimization gives 2x straight-line improvement
  • Weak “not bad” considering source-to-source
slide-22
SLIDE 22

Outline

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 22

  • Basic approach
  • Strong vs. weak atomicity
  • Benchmark evaluation
  • Lessons learned
  • Conclusion
slide-23
SLIDE 23

Some lessons

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 23

  • 1. Need multiple-readers (cf. reader-writer locks) and

flexible ownership granularity (e.g., array words)

  • 2. High-level approach great for prototyping, debugging

– But some pain appeasing Java’s type-system

  • 3. Focus on synchronization/contention (see (2))

– Straight-line performance often good enough

  • 4. Strong-atomicity optimizations doable but need more
  • 5. Modern language features a fact of life
slide-24
SLIDE 24

Related work

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 24

Prior software implementations one of:

  • Optimistic reads and writes + weak-atomicity
  • Optimistic reads, own for writes + weak-atomicity
  • For uniprocessors (no barriers)

All use low-level libraries and/or code-generators Hardware:

  • Strong atomicity via cache-coherence technology
  • We need a software and language-design story too
slide-25
SLIDE 25

Conclusion

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 25

Atomicity for Java via source-to-source translation and

  • bject-ownership

– Synchronization only when there’s contention Techniques that apply to other approaches, e.g.:

  • Retain ownership until contention
  • Optimize strong-atomicity barriers

The design space is large and worth exploring – Source-to-source not a bad way to explore

slide-26
SLIDE 26

To learn more

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 26

  • Washington Advanced Systems for Programming

wasp.cs.washington.edu

  • First-author: Benjamin Hindman

– B.S. in December 2006 – Graduate-school bound – This is just 1 of his research projects

slide-27
SLIDE 27

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 27

[ Presentation ends here ]

slide-28
SLIDE 28

Not-used-in-atomic

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 28

This work: Type-based analysis for not-used-in-atomic

  • If field f never accessed in atomic, remove all

barriers on f outside atomic

  • (Also remove write-barriers if only read-in-atomic)
  • Whole-program, linear-time

Ongoing work:

  • Use real points-to information

– Present work undersells the optimization’s worth

  • Compare value to thread-local
slide-29
SLIDE 29

Strong atomicity

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 29

(behave as if) no interleaved computation

  • Before a transaction “commits”

– Other threads don’t “read its writes” – It doesn’t “read other threads’ writes”

  • This is just the semantics

– Can interleave more unobservably

slide-30
SLIDE 30

Weak atomicity

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 30

(behave as if) no interleaved transactions

  • Before a transaction “commits”

– Other threads’ transactions don’t “read its writes” – It doesn’t “read other threads’ transactions’ writes”

  • This is just the semantics

– Can interleave more unobservably

slide-31
SLIDE 31

Evaluation

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 31

Strong atomicity for Caml at little cost – Already assumes a uniprocessor – See the paper for “in the noise” performance

  • Mutable data overhead
  • Choice: larger closures or slower calls in transactions
  • Code bloat (worst-case 2x, easy to do better)
  • Rare rollback

not in atomic in atomic read none none write none log (2 more writes)

slide-32
SLIDE 32

Strong performance problem

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 32

Recall uniprocessor overhead: not in atomic in atomic read none none write none some With parallelism: not in atomic in atomic read none iff weak some write none iff weak some Start way behind in performance, especially in imperative languages (cf. concurrent GC)

slide-33
SLIDE 33

Not-used-in-atomic

22 October 2006 Atomicity via Source-Source Translation, MSPC2006 33

Revisit overhead of not-in-atomic for strong atomicity, given information about how data is used in atomic in atomic no atomic access none none no atomic write none some atomic write read some some write some some not in atomic

  • Yet another client of pointer-analysis
  • Preliminary numbers very encouraging (with Intel)

– Simple whole-program pointer-analysis suffices