Extending Hardware Transactional Memory to Support Non-busy Waiting - - PowerPoint PPT Presentation

extending hardware transactional memory to support non
SMART_READER_LITE
LIVE PREVIEW

Extending Hardware Transactional Memory to Support Non-busy Waiting - - PowerPoint PPT Presentation

Extending Hardware Transactional Memory to Support Non-busy Waiting and Non-transactional Actions Craig Zilles and Lee Baugh University of Illinois at Urbana-Champaign paper available at: http:/


slide-1
SLIDE 1

Extending Hardware Transactional Memory to Support Non-busy Waiting and Non-transactional Actions

Craig Zilles and Lee Baugh University of Illinois at Urbana-Champaign paper available at:

http:/ /www-faculty.cs.uiuc.edu/~zilles/papers/non_transact.transact2006.pdf

slide-2
SLIDE 2

Two main TM thrusts

HW-centric common-case performance, strong atomicity implicit (avoid re-compile of libraries) simple semantics handling overflow

slide-3
SLIDE 3

Two main TM thrusts

HW-centric common-case performance, strong atomicity implicit (avoid re-compile of libraries) simple semantics handling overflow SW-centric flexibility/extensibility, richer semantics tighter integration with language/run-time lower performance, weak atomicity explicit (code includes transaction info)

slide-4
SLIDE 4

This Paper

HW-centric common-case performance, strong atomicity implicit (avoid re-compile of libraries) simple semantics handling overflow SW-centric flexibility/extensibility, richer semantics tighter integration with language/run-time lower performance, weak atomicity explicit (code includes transaction info)

slide-5
SLIDE 5

Outline

Background: Virtual Transactional Memory (VTM) Waiting w/o spinning: “retry” due to conflict (much like semaphores) Pausing as a transactional loop-hole accesses to contended data performing non-transactional actions retaining state across an abort

slide-6
SLIDE 6

Virtual Transactional Memory

Goals: Small XACT: entirely in cache, no overhead Large XACT: ownership/undo state stored in- memory, can persist across time-slice Allow both kinds to co-exist Eager conflict detection, versioning Transactional status word (XSW) Holds transaction state (active, commit, abort) Pointed to by ownership records Monitored by running transaction

slide-7
SLIDE 7

Retry

Avoid “lost wake-up” bugs Composable means of “wait for multiple objects”

element *get_element_to_process() { TRANSACTION_BEGIN; for (int i = 0 ; i < NUM_LISTS ; ++ i) { if (list[i].has_element()) { element *e = list[i].get_element(); TRANSACTION_END; return e; } } retry; }

slide-8
SLIDE 8

Implementation

  • 1. Ensure retry’ed transaction loses conflicts
  • 2. Want to de-schedule thread until conflict

VTM already supports persistent transactions Main challenge is making sure wake-up occurs

slide-9
SLIDE 9

Ensuring Wake-up

Race condition between de-scheduling and being aborted Atomically transfer responsibility of waking thread After marking thread as blocked, Add marker to XSW with compare-and-swap If fails, re-schedule thread (already aborted)

slide-10
SLIDE 10

Wait on contention

Three outcomes: Abort Spin De-schedule For long transactions with low contention Mitigates worst case behavior Corresponds to O/S semaphores

accesses D (successfully)

X

time

T1 T2 tries to access D conflict!

slide-11
SLIDE 11

Implementation

Build a list of who waits on who Deterministic contention manager -> no cycles Annotated XSW indicates there are waiters Same trick to transfer wake-up responsibility

XSW

waiters w_prev w_next task

XSW

waiters w_prev w_next task

XSW

waiters w_prev w_next task T1 LTSS T2 LTSS T3 LTSS RUNNING T1 task_struct BLOCKED T2 task_struct BLOCKED T3 task_struct

slide-12
SLIDE 12

Pausing Transactions

Providing a transactional loop-hole HTM default is that everything is transactional Enable violating transaction’ s isolation To avoid conflicts on highly-contended data For performing non-transactional actions Logging abort conditions, exceptions, tools

slide-13
SLIDE 13

Simple Example:

xact_begin xact_pause xact_unpause ABORT! X xact_begin increment statistic atomically (using CAS) register compensation action (perform compensation) decrement statistic atomically (using CAS) deallocate compensation data (retry transaction) (try transaction) transactional non-transactional

... transaction { ... ... ++ statistic; ... } ...

slide-14
SLIDE 14

Implementation

Paused modifier to transaction state Distinct from “swapped” Load/stores not added to read/write set Strong atomicity, but... Allow reads to footprint (passing arguments) Handling writes to footprint? Clean semantics demand write through Common occurrences (e.g., stack) don’t

slide-15
SLIDE 15

Implementation, cont.

No atomicity/isolation guarantees Must conventionally synchronize Support registering compensation in software Register function and arguments Performed after commit/abort (+/- atomically)

typedef struct comp_action_s { struct comp_action_s *next; comp_function_t comp_func; // data for compensation } comp_action_t; typedef struct comp_lists_s { comp_action_t *abort_actions; comp_action_t *commit_actions; } comp_lists_t; typedef void (*comp_function_t)(struct comp_action_s *ca, bool do_action); func1 data1a func2 data2 data1b

slide-16
SLIDE 16

Implementation, cont.

Non-isomorphic to “non-xact load/store” No (asynchronous) aborts in paused region Must release locks, insert compensation

slide-17
SLIDE 17

Support Malloc/Free

dlmalloc uses mmap/munmap for large allocations even HTM shouldn’t absorb kernel activity aborted mmap leaks virtual address space munmap shouldn’t be performed until commit free implementation: pause, query xact state if no-xact: do operation if xact: register commit action, unpause

slide-18
SLIDE 18

Pause vs. Open Nesting

Can be used for some of the same tasks Open Nesting More overhead (nesting in hardware?) Stronger guarantees (transaction) Not always necessary Isolated data items (use CAS) Thread-local data

slide-19
SLIDE 19

Conclusion

Shown two extensions to HTM system Support non-busy waiting by transactions Support non-transactional work in transaction Minimal impact on hardware extension of existing XSW calling of software handlers through exceptions