Outline 0024 Spring 2010 24 :: 2 Parallel application - - PowerPoint PPT Presentation
Outline 0024 Spring 2010 24 :: 2 Parallel application - - PowerPoint PPT Presentation
Outline 0024 Spring 2010 24 :: 2 Parallel application development 0024 Spring 2010 24 :: 3 0024 Spring 2010 24 :: 4 Lock data, not code 0024 Spring 2010 24 :: 5 Do you really need
– 24 :: 2 0024 Spring 2010
Outline
– 24 :: 3 0024 Spring 2010
Parallel application development
– 24 :: 4 0024 Spring 2010
– 24 :: 5 0024 Spring 2010
Lock data, not code
– 24 :: 6 0024 Spring 2010
Do you really need locks?
No shared data => no need for locks
Recall that CSP gives you a model to avoid locks
No free lunch
Lock-free data structures
Mutex-free by design Growing number of class/data structures
– 24 :: 7 0024 Spring 2010
Detour: no shared data
What if we could write programs so that there are no side-effects?
Think about the simple finite impulse response filter for N
inputs
Think of computing an expensive function for N numbers Think of searching for a string in N documents
– 24 :: 8 0024 Spring 2010
MapReduce
Basic idea: Parallel computing framework for restricted parallel programming model Useful to distribute work to a farm (cluster) of compute nodes User specifies what needs to be done for each data item (“map”) and how results are to be combined (“reduce”) Libraries take care of everything else
Parallelization Fault Tolerance Data Distribution Load Balancing
– 24 :: 9 0024 Spring 2010
MapReduce
Map()
Process a key/value pair to generate intermediate key/value pairs
Reduce()
Merge all intermediate values associated with the same key
Names originated in the functional programming world … but slightly different semantics
– 24 :: 10 0024 Spring 2010
Example: Counting Words
Map()
Input <filename, file text> Parses file and emits <word, count> pairs
E.g. <”hello”, 1>
Reduce()
Sums all values for the same key Emits <word, TotalCount>
E.g. <”hello”, 5 > <”hello”, 1>
<”hello”, 2 > <”hello”, 7 > => <”hello”, 15>
– 24 :: 11 0024 Spring 2010
Example Use of MapReduce
Counting words in a large set of documents
map(string key, string value) //key: document name //value: document contents for each word w in value
EmitIntermediate(w, “1”);
reduce(string key, iterator values) //key: word //values: list of counts int results = 0; for each v in values
result += ParseInt(v);
Emit(AsString(result));
– 24 :: 12 0024 Spring 2010
Data Distribution
Input files are split into pieces
distributed file system
Intermediate files created from map tasks are written to local disk Output files are written to distributed file system
– 24 :: 13 0024 Spring 2010
Assigning Tasks
Many copies of user program are started Tries to utilize data localization by running map tasks on machines with data One instance becomes the master Master finds idle machines and assigns tasks
– 24 :: 14 0024 Spring 2010
– 24 :: 15 0024 Spring 2010
MapReduce
– 24 :: 16 0024 Spring 2010
Do you really need locks?
No shared data => no need for locks
Recall that CSP gives you a model to avoid locks
No free lunch
Lock-free data structures
Mutex-free by design Growing number of class/data structures
– 24 :: 17 0024 Spring 2010
Why Locking Doesnt Scale
Not Robust Relies on conventions Hard to Use
Conservative Deadlocks Lost wake-ups
Not Composable
– 24 :: 18 0024 Spring 2010
Locks are not Robust
If a thread holding a lock is delayed … No one else can make progress
– 24 :: 19 0024 Spring 2010
Why Locking Doesnt Scale
Not Robust Relies on conventions Hard to Use
Conservative Deadlocks Lost wake-ups
Not Composable
– 24 :: 20 0024 Spring 2010
Locking Relies on Conventions
Relation between
Lock bit and object bits Exists only in programmers mind
/* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder,mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */
Actual comment from Linux Kernel
(hat tip: Bradley Kuszmaul)
– 24 :: 21 0024 Spring 2010
Why Locking Doesnt Scale
Not Robust Relies on conventions Hard to Use
Conservative Deadlocks Lost wake-ups
Not Composable
– 24 :: 22 0024 Spring 2010
Sadistic Homework
enq( x) enq( y) Fifo queue No interference if ends “far enough” apart
– 24 :: 23 0024 Spring 2010
Sadistic Homework
deq( ) deq( ) Double-ended queue Interference OK if ends “close enough” together
– 24 :: 24 0024 Spring 2010
Sadistic Homework
deq( ) deq( ) Double-ended queue Make sure suspended dequeuers awake as needed
– 24 :: 25 0024 Spring 2010
In Search of the Lost Wake-Up
Waiting thread doesnt realize when to wake up Its a real problem in big systems
“Calling pthread_cond_signal() or pthread_cond_broadcast() when
the thread does not hold the mutex lock associated with the condition can lead to lost wake-up bugs.” from Google™ search for “lost wake-up”
– 24 :: 26 0024 Spring 2010
You Try It …
One lock?
Too Conservative
Locks at each end?
Deadlock, too complicated, etc
Waking blocked dequeuers?
Harder than it looks
– 24 :: 27 0024 Spring 2010
Actual Solution
Clean solution would be a publishable result [Michael & Scott, PODC 96] High performance fine-grained lock-based solutions are good for libraries… not general consumption by programmers
– 24 :: 28 0024 Spring 2010
Why Locking Doesnt Scale
Not Robust Relies on conventions Hard to Use
Conservative Deadlocks Lost wake-ups
Not Composable
– 24 :: 29 0024 Spring 2010
Locks do not compose
add(T1, item) delete(T1, item) add(T2, item) item item
Move from T1 to T2
Must lock T2 before deleting from T1 lock T2 lock T2 lock T1 lock T1 lock T1 lock T1 item
Exposing lock internals breaks abstraction
Hash Table
Must lock T1 before adding item
– 24 :: 30 0024 Spring 2010
Monitor Wait and Signal
zzz If buffer is empty, wait for item to show up
Empty buffer
Yes!
– 24 :: 31 0024 Spring 2010
Wait and Signal do not Compose
empty empty zzz…
Wait for either?
– 24 :: 32 0024 Spring 2010
The Transactional Manifesto
What we do now is inadequate to meet the multi-core challenge Research Agenda
Replace locking with a transactional API Design languages to support this model Implement the run-time to be fast enough
– 24 :: 33 0024 Spring 2010
Transactions
Atomic
Commit: takes effect Abort: effects rolled back
Usually retried
Linearizable
Appear to happen in one-at-a-time order
– 24 :: 34 0024 Spring 2010
at om i c {
- x. r em
- ve( 3) ;
- y. add( 3) ;
} at om i c { y = nul l ; }
Atomic Blocks
– 24 :: 35 0024 Spring 2010
at om i c {
- x. r em
- ve( 3) ;
- y. add( 3) ;
} at om i c { y = nul l ; }
Atomic Blocks
No data race
– 24 :: 36 0024 Spring 2010
Publ i c voi d Lef t Enq( i t em x) { Q node q = new Q node( x) ;
- q. l ef t = t hi s. l ef t ;
t hi s. l ef t . r i ght = q; t hi s. l ef t = q; }
Sadistic Homework Revisited
(1)
Write sequential code
– 24 :: 37 0024 Spring 2010
Publ i c voi d Lef t Enq( i t em x) { at om i c { Q node q = new Q node( x) ;
- q. l ef t = t hi s. l ef t ;
t hi s. l ef t . r i ght = q; t hi s. l ef t = q; } }
Sadistic Homework Revisited
(1)
– 24 :: 38 0024 Spring 2010
Publ i c voi d Lef t Enq( i t em x) { at om i c { Q node q = new Q node( x) ;
- q. l ef t = t hi s. l ef t ;
t hi s. l ef t . r i ght = q; t hi s. l ef t = q; } }
Sadistic Homework Revisited
(1)
Enclose in atomic block
– 24 :: 39 0024 Spring 2010
Warning
Not always this simple
Conditional waits Enhanced concurrency Overlapping locks
But often it is
Works for sadistic homework
– 24 :: 40 0024 Spring 2010
Publ i c voi d Tr ansf er ( Q ueue q1, q2) { at om i c { O bj ect x = q1. deq( ) ;
- q2. enq( x) ;
} }
Composition
Trivial or what?
– 24 :: 41 0024 Spring 2010
Publ i c O bj ect Lef t Deq( ) { at om i c { i f ( t hi s. l ef t == nul l ) r et r y; … } }
Wake-ups: lost and found
Roll back transaction and restart when something changes
– 24 :: 42 0024 Spring 2010
OrElse Composition
at om i c { x = q1. deq( ) ; } or El se { x = q2. deq( ) ; }
Run 1st method. If it retries … Run 2nd method. If it retries … Entire statement retries
– 24 :: 43 0024 Spring 2010
Transactional Memory
Software transactional memory (STM) Hardware transactional memory (HTM) Hybrid transactional memory (HyTM, try in hardware and default to software if unsuccessful)
– 24 :: 44 0024 Spring 2010
Design Issues
Implementation choices Language design issues Semantic issues
– 24 :: 45 0024 Spring 2010
Granularity
Object
managed languages, Java, C#, … Easy to control interactions between transactional & non-
trans threads
Word
C, C++, … Hard to control interactions between transactional & non-
trans threads
– 24 :: 46 0024 Spring 2010
Direct/Deferred Update
Deferred
modify private copies & install on commit Commit requires work Consistency easier
Direct
Modify in place, roll back on abort Makes commit efficient Consistency harder
– 24 :: 47 0024 Spring 2010
Conflict Detection
Eager
Detect before conflict arises “Contention manager” module resolves
Lazy
Detect on commit/abort
Mixed
Eager write/write, lazy read/write …
– 24 :: 48 0024 Spring 2010
Conflict Detection
Eager detection may abort transaction that could have committed. Lazy detection discards more computation.
– 24 :: 49 0024 Spring 2010
Contention Managers
Oracle that resolves conflicts
For eager conflict detection
CM decides
Whether to abort other transaction Or give it a chance to finish …
– 24 :: 50 0024 Spring 2010
Contention Manager Strategies
Exponential backoff Oldest gets priority Most work done gets priority Non-waiting has priority over waiting Lots of alternatives
None seems to dominate But choice can have big impact
– 24 :: 51 0024 Spring 2010
I/O & System Calls?
Some I/O revocable
Provide transaction-safe libraries Undoable file system/DB calls
Some not
Opening cash drawer Firing missile
– 24 :: 52 0024 Spring 2010
I/O & System Calls
One solution: make transaction irrevocable
If transaction tries I/O, switch to irrevocable mode.
There can be only one …
Requires serial execution
No explicit aborts
In irrevocable transactions
– 24 :: 53 0024 Spring 2010
More problems ahead
Maybe we can revisit those issues in a future class on Advanced Parallel Programming
- after you took classes on System Programming,
Computer Architecture, Compilers, and Performance Evaluation.
– 24 :: 54 0024 Spring 2010
Parallel programming is difficult
Stand on the shoulder of giants
Dont roll your own {lock, event queue, …} if there exists a
solution provided by others
But know how to build your own if necessary Know if you can trust the solution
E.g., Java 6 (and Java 7) provide many thread-safe utility
classes
Program at the highest level possible
OpenMP may not be the most elegant language but if your
problem fits the model, its a big win.
Let the system handle the details See above
– 24 :: 55 0024 Spring 2010
… but if you must do it, do it right
Document your design & implementation Prove essential properties of the solution
– 24 :: 56 0024 Spring 2010
What we did not teach
We (almost) avoided the issue of “performance” Why?
Enough material Course objective is to advance your understanding of
(imperative) programming
… and to prepare for several subsequent classes
Must understand more aspects of computer architecture
Caches Cache management Instruction sets of modern processors
Implementation of JCSP not “performance-sensitive” Implementation of JOMP not “performance-sensitive”
But good implementations for C/C++ and Fortran exist
– 24 :: 57 0024 Spring 2010
Future classes
System programming and computer architecture Operating systems and networks ? Practice of Parallel Programming ? Advanced Parallel Programming Concurrent object-oriented programming I Concurrent object-oriented programming II Advanced Parallel Computing for Scientific Applications
– 24 :: 58 0024 Spring 2010