Preemptible Atomics Jan Vitek Jason Baker, Antonio Cunei, Jeremy - - PowerPoint PPT Presentation

preemptible atomics
SMART_READER_LITE
LIVE PREVIEW

Preemptible Atomics Jan Vitek Jason Baker, Antonio Cunei, Jeremy - - PowerPoint PPT Presentation

Preemptible Atomics Jan Vitek Jason Baker, Antonio Cunei, Jeremy Manson, Marek Prochazka, Bin Xin Suresh Jagannathan, Jan Vitek NSF grant CCF-0341304 and DARPA PCES. Why not Lock-based Synchronization? Challenges of programming with mutual


slide-1
SLIDE 1

Preemptible Atomics

Jan Vitek

Jason Baker, Antonio Cunei, Jeremy Manson, Marek Prochazka, Bin Xin Suresh Jagannathan, Jan Vitek

NSF grant CCF-0341304 and DARPA PCES.

slide-2
SLIDE 2

(c) Jan Vitek 2006

Why not Lock-based Synchronization?

Challenges of programming with mutual exclusion locks: avoiding data races choosing lock granularity enforcing lock acquisition order dealing with modularity and abstraction & in hard real-time systems: bounding blocking time avoiding priority inversion

slide-3
SLIDE 3

(c) Jan Vitek 2006

Preemptible Atomics

Transactional concurrency control construct Designed for commodity uniprocessor embedded systems Alternative to locks with, e.g., priority inheritance (PIP) Atomicity All statements will execute, or none. Strong Isolation High priority threads (HPT) preempt Atomics in LPTs HPT execute without observing changes performed by LPT

slide-4
SLIDE 4

(c) Jan Vitek 2006

Example with Locks

class ThreadPoolLane { 1 synchronized leaderExec(Request task) { 2 if (borrowThreadAndExec(task)) 3 synchronized(rQueue) { 4 rQueue.enqueue(task); 5 numBuffered++; } ... } class Queue { 7 final Object sObject = new Object(); 8 void enqueue(Object data) { 9 QueueNode node=getNode(); 10 node.value=data; 11 synchronized(sObject) { 12 // enqueue the object } } from the UCI Zen real-time ORB

slide-5
SLIDE 5

(c) Jan Vitek 2006

Example with Atomics

class ThreadPoolLane { 1 @Atomic leaderExec(final Request task) { 2 if (borrowThreadAndExec(task)) 3 4 rQueue.enqueue(task); 5 numBuffered++; } ... } class Queue { 8 @Atomic void enqueue(final Object data) { 9 QueueNode node=getNode(); 10 node.value=data; 12 // enqueue the object }

slide-6
SLIDE 6

(c) Jan Vitek 2006

Related Work

Bershad, Redell, Ellis. Fast Mutual Exclusion for Uniprocessors, ASPLOS, 1992.

  • - no undo

Anderson, Ramamurthy, Jeffay, Real-time Computing with Lock-Free Shared Objects, RTSS, 1995.

  • - non-blocking algorithms, no language support

Herlihy+, Harris+, Welc+, Software Transactional Memory, 2003--2005.

  • - weak isolation

Ringenburg, Grossman, AtomCaml First-Class Atomicity with Rollback, ICFP, 2005.

  • - no real-time guarantees, simpler environment
slide-7
SLIDE 7

(c) Jan Vitek 2006

Semantics

B logically atomic B can be preempted by a higher-priority thread If preempted, B’s updates not be observed by HPT Nesting coalesced in a single atomic.

@Atomic method(...) { B }

slide-8
SLIDE 8

(c) Jan Vitek 2006

PIP locks vs Atomics

HP MP LP a b b b a a a b

Locks with Priority Inheritance Protocol Atomics

HP MP LP

undo undo

slide-9
SLIDE 9

(c) Jan Vitek 2006

Schedulability

Assuming tasks scheduled with a rate monotonic scheme:

Theorem 1 A set of n periodic tasks τi, 0 ≤ i < n is schedulable in RM, iff ∀i ≤ n, ∃Ri : Ri ≤ pi Ri = Ci + max

j∈lp(i) Uj +

  • j∈hp(i)

Ri pj

  • (Cj + Ui + Wi)
slide-10
SLIDE 10

(c) Jan Vitek 2006

Atomic vs. PIP | PCE

Priority Inheritance Protocol: A HPT may block for multiple LPT Deadlock and data races Non-real-time LPTs may cause unbounded blocking programmer error, but an easy one to make. Priority Ceiling Protocol: HPTs may still have to wait for completion of a LPT Hard to assign ceilings with libraries, changing thread priorities Preemptible Atomic Region: HPTs only block for higher-level tasks. At most one abort per context switch. no dead-locks & no live-locks if schedulable

slide-11
SLIDE 11

(c) Jan Vitek 2006

Refactoring Legacy Code

Locks ⇒ Atomics = ~straightforward All uses of a particular lock must be made into atomic Consider:

public class Vector extends AbstractList ... { @Atomic public void insertElementAt(Object o ... @Atomic public int size() { ...

N.B. requires preemptible & logged System.arraycopy

slide-12
SLIDE 12

(c) Jan Vitek 2006

Locks and Atomics

Atomic must coexist with PIP-locks Lock long lived, write-intensive methods HPT in an atomic needs to acquire lock held by a LPT: undo ⇒ boost and execute LPT ⇒ reexecute HPT Wait / Notify can be used when needed

slide-13
SLIDE 13

(c) Jan Vitek 2006

IMPLEMENTATION

slide-14
SLIDE 14

(c) Jan Vitek 2006

Implementation

A method “@Atomic f(){ x++; B(); }” is translated to:

while (true) { try{ try { Transaction.start(); log(x); x++ B_T(); } finally { Transaction.commit(); break; } } catch(Retry _) { } // undo performed by aborting thread }

finally implemented by catching all subclasses of Throwable Retry not a subclass of Throwable,not get caught by finally

slide-15
SLIDE 15

(c) Jan Vitek 2006

Scaling-Up

I/O - How do you undo a write to the screen? You don’t. Could support buffering of output/replay of input or using compensations Garbage collection - Addresses stored in log need to be updated. GC must be preemptible and cannot preempt RT task. Now - Rollback the Atomic if a GC is triggered. Dynamic class loading - Could generate transactional versions of methods

  • n the fly. Now - RT does not require dynamic class loading.

Reflection - Methods invoked reflectively from an Atomic must be

  • transactional. Simple check in the implementation of the reflection package.

Regions - Memory allocated within a region must be returned on abort to avoid leaks. Asynchronous Transfer of control - Defer until interruptible, then abort.

slide-16
SLIDE 16

(c) Jan Vitek 2006

Optimizations

Turn an atomic into a nop

@Atomic m() => @Uninterruptible m()

Safe iff execution time is bounded Heuristic: short, non-looping methods (n.b. not safe for lock-based sync)

slide-17
SLIDE 17

(c) Jan Vitek 2006

Extensions

Prescient commits exception throwing code does not affect or rely on user allocated heap data Open nesting string interning requires that strings not be undone as the VM kernel has pointer on char array Exposed regions

  • perations are immediately made visible, aborts are deferred,

e.g. for debugging

slide-18
SLIDE 18

(c) Jan Vitek 2006

Evaluation

slide-19
SLIDE 19

(c) Jan Vitek 2006

0.0 0.5 1.0 1.5 2.0 c

  • m

p r e s s j e s s d b j a v a c m p e g a u d i

  • m

t r t j a c k

Time, relative to Ovm Ovm 1.01 RTSJ Ovm 1.01 GCJ 4.0.2 HotSpot1.5.0.06 jTime 1.0 5.6 2.2 12.2 4.4

Ovm performance is competitive.

AMD Athlon XP1900+, 1.6GHz, 1GB RTLinux, 2.4.7-timesys-3.1.214

SpecJVM98

slide-20
SLIDE 20

(c) Jan Vitek 2006

HTP response times

2 threads, performing mix of get/put ops into a HashMap 300Mhz PPC, 256MB memory, Embedded Planet Linux Ovm RTSJ VM, AOT, priority preemptive, PIP locks

80% Reads, 20% Writes

250 300 350 400 450 500 550 600 650 700 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Frames Response Time [!s]

PAR-based HashMap Synchronized HashMap

Microbenchmarks

20% Reads, 80% Writes

250 300 350 400 450 500 550 600 650 700 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Frames Response Time [!s]

slide-21
SLIDE 21

(c) Jan Vitek 2006

5 10 15 20 25 30 35 40 45 50 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 5 10 15 20 25 30 35 40 45 50 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281

Response time [ms] Frames Frames

Synchronized Preemptible

Low priority High priority

Figure 6.

RT-Zen Results. Comparing the response time for a game server running on top of a Real-time Java CORBA im-

  • plementation. There are two thread groups (low and high) handling 300 requests each. The y-axis indicates the time taken by the

application code to process the request. Lower is better.

UCI’s RT-ZEN

Real-time CORBA ORB written in RTSJ, 179,000 LOC, ~600 synchronized stmts mechanically translated to atomics

30 HPT/70 LPT. Measure time to process a request

AMD Athlon XP1900+, 1.6GHz, 1GB RTLinux

slide-22
SLIDE 22

(c) Jan Vitek 2006

PRiSMj

Avionics applications from the Boeing Company Benchmark scenarios w. different workloads / components Oscillating modal behavior ~100 periodic threads in three main rate groups: 1, 5, 20Hz 953 Java classes, 6616 methods. Deployed on a ScanEagle

slide-23
SLIDE 23

(c) Jan Vitek 2006

PRiSMj: 1X

High responsiveness, small workloads

300Mhz PPC, 256MB memory, Embedded Planet Linux Ovm RTSJ VM, AOT, priority preemptive, PIP locks

  • 0.02

0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381

5Hz 20Hz 1Hz

  • 0.02

0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381

Atomics (aborts): 3'180 (0) Reads Max (median): 514 (6) Writes Max (media): 115 (3) Monitor inflated: 1338

slide-24
SLIDE 24

(c) Jan Vitek 2006

PRiSMj: 100X

Large workloads

300Mhz PPC, 256MB memory, Embedded Planet Linux Ovm RTSJ VM, AOT, priority preemptive, PIP locks

0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381

Infrastructure 5Hz 20Hz 1Hz

0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361

Atomics (aborts): 151'438 (5) Reads Max (median): 5'399 (3) Writes Max (median): 1'158 (0)

slide-25
SLIDE 25

(c) Jan Vitek 2006

Conclusions

Easier to write reusable correct concurrent real-time code Improve responsiveness with little impact on throughput Not a replacement for locks, another tool in the box

source code at http://ovmj.org

[Manson+. Preemptible Atomic Regions for Real-time Java. RTSS’05] [Baker+. A Real-time Java Virtual Machine for Avionics. RTAS’06]