Concurrency, Races & Synchronization CS 450: Operating Systems - PowerPoint PPT Presentation

void contextswitch(Context *from, Context *to) { #define setcontext(u) setmcontext(&(u)->uc_mcontext) if( swapcontext(&from->uc, &to->uc) < 0){ #define getcontext(u) getmcontext(&(u)->uc_mcontext) fprint(2, "swapcontext failed: %r\n"); assert(0); #define SET setmcontext } #define GET getmcontext } SET: int swapcontext(ucontext_t *oucp, movl 4(%esp), %eax const ucontext_t *ucp) { ... if( getcontext(oucp) == 0) movl 28(%eax), %ebp setcontext(ucp) ; ... return 0; movl 72(%eax), %esp } pushl 60(%eax) /* new %eip */ movl 48(%eax), %eax struct ucontext { ret sigset_t uc_sigmask; mcontext_t uc_mcontext; GET: ... movl 4(%esp), %eax }; ... struct mcontext { movl %ebp, 28(%eax) ... ... int mc_ebp; movl $1, 48(%eax) /* %eax */ ... movl (%esp), %ecx /* %eip */ int mc_ecx; movl %ecx, 60(%eax) int mc_eax; leal 4(%esp), %ecx /* %esp */ ... movl %ecx, 72(%eax) int mc_eip; int mc_cs; movl 44(%eax), %ecx /* restore %ecx */ int mc_eflags; movl $0, %eax int mc_esp; ret ... };

Next: return to reason #3 for concurrency (performance)

int A[DIM][DIM], /* src matrix A */ B[DIM][DIM], /* src matrix B */ C[DIM][DIM]; /* dest matrix C */ /* C = A x B */ void matrix_mult () { int i, j, k; for (i=0; i<DIM; i++) { for (j=0; j<DIM; j++) { C[i][j] = 0; for (k=0; k<DIM; k++) C[i][j] += A[i][k] * B[k][j]; } } } Run time, with DIM=50, real 0m1.279s user 0m1.260s 500 iterations: sys 0m0.012s

void row_dot_col(void *index) { void run_with_thread_per_cell() { int *pindex = (int *)index; pthread_t ptd[DIM][DIM]; int i = pindex[0]; int index[DIM][DIM][2]; int j = pindex[1]; for(int i = 0; i < DIM; i ++) C[i][j] = 0; for(int j = 0; j < DIM; j ++) { for (int x=0; x<DIM; x++) index[i][j][0] = i; C[i][j] += A[i][x]*B[x][j]; index[i][j][1] = j; } pthread_create(&ptd[i][j], NULL, row_dot_col, index[i][j]); } for(i = 0; i < DIM; i ++) for(j = 0; j < DIM; j ++) pthread_join( ptd[i][j], NULL); } Run time, with DIM=50, real 4m18.013s user 0m33.655s 500 iterations: sys 4m31.936s

void run_with_n_threads(int num_threads) { void *compute_rows(void *arg) { pthread_t tid[num_threads]; int *bounds = (int *)arg; int tdata[num_threads][2]; for (int i=bounds[0]; int n_per_thread = DIM/num_threads; i<=bounds[1]; i++) { for (int i=0; i<num_threads; i++) { for (int j=0; j<DIM; j++) { tdata[i][0] = i*n_per_thread; C[i][j] = 0; tdata[i][1] = (i < num_threads) for (int k=0; k<DIM; k++) ? ((i+1)*n_per_thread)-1 C[i][j] += A[i][k] : DIM; * B[k][j]; pthread_create(&tid[i], NULL, } compute_rows, } tdata[i]); } } for (int i=0; i<num_threads; i++) pthread_join(tid[i], NULL); }

1.700 1.275 0.850 0.425 0.000 1 2 3 4 5 6 7 8 9 10 Dual processor system, Num. threads kernel threading, 1.700 DIM=50, 500 iterations 1.275 0.850 0.425 0.000 1 2 3 4 5 6 7 8 9 10 Num. threads Real User System

but matrix multiplication happens to be an embarrassingly parallelizable computation! - not typical of concurrent tasks!

computations on shared data are typically interdependent (and this isn’t always obvious!) — may impose a cap on parallelizability

Amdhal’s law predicts max speedup given two parameters: - P : parallelizable fraction of program - N : # of execution cores

1 max speedup S = P N + (1 − P ) † P → 1; S → N ‡ N → ∞ ; S → 1/(1 - P )

source: http://en.wikipedia.org/wiki/File:AmdahlsLaw.svg

Amdahl’s law is based on a fixed problem size with fixed parallelizable fraction — but we can argue that as we have more computing power we simply tend to throw larger / more granular problem sets at it

e.g., graphics processing: keep turning up resolution/detail weather modeling: increase model parameters/accuracy chess/weiqi AI: deeper search tree

Gustafson & Barsis posit that - we tend to scale problem size to complete in the same amount of time , regardless of the number of cores - parallelizable amount of work scales linearly with # of cores

Gustafson’s Law computes speedup based on: - N cores - non- parallelizable fraction, P

speedup S = N – P ∙ ( N – 1) † P → 1; S → 1 † P → 0; S → N - predicted speedup is linear with respect to number of cores!

Speedup: S Number of cores: N

Amdahl’s vs. Gustafson’s: - latter has rosier implications for big data / data science - but not all datasets naturally increase in resolution - both stress the import of maximizing parallelization

some primary challenges of concurrent programming are to: 1. identify thread interdependencies 2. identify (1)’s potential ramifications 3. ensure correctness

e.g., final change in count? (expected = 2) Thread A Thread B a1 count = count + 1 b1 count = count + 1 interdependency: shared var count

factoring in machine-level granularity: Thread A Thread B a1 lw (count), %r0 b1 lw (count), %r0 a2 add $1, %r0 b2 add $1, %r0 a3 sw %r0, (count) b3 sw %r0, (count) answer: either +1 or +2!

race condition ( s ) exists when results are dependent on the order of execution of concurrent tasks

shared resource ( s ) are the problem or, more specifically, concurrent mutability of shared resources

code that accesses shared resource(s) = critical section

synchronization : time-sensitive coordination of critical sections so as to avoid race conditions

e.g., specific ordering of different threads, or   mutually exclusive access to variables

important: try to separate and decouple application logic from synchronization details - not doing this well adds unnecessary complexity to high-level code, and makes it much harder to test and maintain!

most common technique for implementing synchronization is via software “locks” - explicitly required & released by consumers of shared resources

§ Locks & Locking Strategies

basic idea: - create a shared software construct that has well defined concurrency semantics - aka. a “thread-safe” object - Use this object as a guard for another, un-thread-safe   shared resource

Thread A Thread B a1 count = count + 1 b1 count = count + 1 count e r acquire i u q c a T A T B

Thread A Thread B a1 count = count + 1 b1 count = count + 1 count allocated e r acquire i u q c a T A T B

Thread A Thread B a1 count = count + 1 b1 count = count + 1 count allocated acquire e s u T A T B

Thread A Thread B a1 count = count + 1 b1 count = count + 1 count allocated e acquire s a e l e r T A T B

Thread A Thread B a1 count = count + 1 b1 count = count + 1 count use allocated T A T B

locking can be: - global ( coarse-grained ) - per-resource ( fine-grained )

coarse-grained locking policy count buff GUI logfile T A T B T C T D

coarse-grained locking: - is (typically) easier to reason about - results in a lot of lock contention - could result in poor resource utilization — may be impractical for this reason

fine-grained locking policy count buff GUI logfile T A T B T C T D

fine-grained locking: - may reduce (individual) lock contention - may improve resource utilization - can result in a lot of locking overhead - can be much harder to verify correctness! - e.g., due to problems such as deadlock

deadlock with fine-grained locking policy count buff GUI logfile T A T B T C T D

so far, have only considered mutual exclusion what about instances where we require a specific order of execution? - often very difficult to achieve with simple-minded locks

§ Abstraction: Semaphore

Little Book of Semaphores

Semaphore rules: 1. When you create the semaphore, you can initialize its value to any integer, but after that the only operations you are allowed to perform are increment (increase by one) and decrement (decrease by one). You cannot read the current value of the semaphore. 2. When a thread decrements the semaphore, if the result is negative, the thread blocks itself and cannot continue until another thread increments the semaphore. 3. When a thread increments the semaphore, if there are other threads waiting, one of the waiting threads gets unblocked.

Initialization syntax: Listing 2.1: Semaphore initialization syntax 1 fred = Semaphore(1)

Operation names? 1 fred.increment_and_wake_a_waiting_process_if_any() 2 fred.decrement_and_block_if_the_result_is_negative() 1 fred.increment() 2 fred.decrement() 1 fred.signal() 2 fred.wait() 1 fred.V() 2 fred.P()

How to use semaphores for synchronization? 1.Identify essential usage “patterns” 2.Solve “classic” synchronization problems

Essential synchronization criteria: 1. avoid starvation 2. guarantee bounded waiting 3. no assumptions on relative speed (of threads) 4. allow for maximum concurrency

§ Using Semaphores for Synchronization

Basic patterns: I. Rendezvous II. Mutual exclusion (Mutex) III.Multiplex IV . Generalized rendezvous / Barrier & Turnstile

I. Rendezvous Thread A Thread B 1 1 statement a1 statement b1 2 2 statement a2 statement b2 Ensure that a1 < b2 , b1 < a2

aArrived = Semaphore(0) bArrived = Semaphore(0) Thread A Thread B 1 statement a1 1 statement b1 2 2 aArrived.signal() bArrived.signal() 3 3 bArrived.wait() aArrived.wait() 4 4 statement a2 statement b2

Note: Swapping 2 & 3 → Deadlock! Thread A Thread B 1 statement a1 1 statement b1 2 2 bArrived.wait() aArrived.wait() 3 3 aArrived.signal() bArrived.signal() 4 4 statement a2 statement b2 Each thread is waiting for a signal that will never arrive

II. Mutual exclusion Thread A Thread B count = count + 1 count = count + 1 Ensure that critical sections do not overlap

mutex = Semaphore(1) Here is a solution: Thread A Thread B mutex.wait() mutex.wait() # critical section # critical section count = count + 1 count = count + 1 mutex.signal() mutex.signal() Danger : if a thread blocks while “holding” the mutex semaphore, it will also block all other mutex-ed threads!

III. multiplex = Semaphore(N) 1 multiplex.wait() 2 critical section 3 multiplex.signal() Permits N threads through into their critical sections

IV . Generalized Rendezvous / Barrier Puzzle: Generalize the rendezvous solution. Every thread should run the following code: Listing 3.2: Barrier code 1 rendezvous 2 critical point

Hint: 1 n = the number of threads 2 count = 0 3 mutex = Semaphore(1) 4 barrier = Semaphore(0)

1 rendezvous 2 3 mutex.wait() 4 count = count + 1 5 mutex.signal() 6 7 if count == n: barrier.signal() 8 9 barrier.wait() 10 barrier.signal() 11 12 critical point

1 rendezvous 2 3 mutex.wait() 4 count = count + 1 5 mutex.signal() 6 7 if count == n: turnstile .signal() 8 9 turnstile .wait() 10 turnstile .signal() 11 12 critical point state of turnstile after all threads make it to 12?

1 rendezvous 2 3 mutex.wait() 4 count = count + 1 5 if count == n: turnstile .signal() 6 mutex.signal() 7 8 turnstile .wait() 9 turnstile .signal() 10 11 critical point fix for non-determinism (but still off by one)

next: would like a reusable barrier need to re-lock turnstile

Concurrency, Races & Synchronization CS 450: Operating Systems - PowerPoint PPT Presentation

Concurrency, Races & Synchronization CS 450: Operating Systems Michael Lee <lee@iit.edu> Agenda - Concurrency: what, why, how - Concurrency-related problems - Locks & Locking strategies - Concurrent programming with semaphores

Concurrency, Races & Synchronization CS 450: Operating Systems Michael Lee

Concurrency: Mutual Exclusion and Synchronization Chapter 5 1 Concurrency Concurrency arises

Concurrency: Mutual Exclusion and Synchronization Chapter 5 1 Concurrency Multiple

Data Races are Bad Race: two threads access memory without synchronization and at least one is a

Concurrency What is concurrency? In computer science, concurrency is a property of systems which

Concurrency: Common Errors Races and Starvation Prof. Patrick G. Bridges 1 University of

Parallel Programming and Heterogeneous Computing Shared-Memory: Concurrency & Synchronization

Time Synchronization and Logical Clocks CS 240: Computing Systems and Concurrency Lecture 5

Concurrency & Synchronization Nima Honarmand Fall 2017 :: CSE 306 Agenda Review basic

ECE 650 Systems Programming & Engineering Spring 2018 Concurrency and Synchronization Tyler

Synchronization, Critical Sections and Concurrency CS 111 Operating Systems Peter Reiher

Multi-Core in JAVA/JVM Tommi Zetterman Concurrency Prior to Java 5: Synchronization and Threads

Real Time Programming: Concepts Radek Pel anek Concurrency Communication and Synchronization

The Synchronization Toolbox The Synchronization Toolbox Mutual Exclusion Mutual Exclusion Race

Basic Synchronization Principles Encourage Concurrency No widely-accepted concurrent

Concurrency On multiprocessors, several threads can execute simultaneously, one on each

Asynchronous Programming Model for Concurrency concurrency Concurrency is when two or more tasks

Content Synchronization Content Synchronization March 2nd 2005 Jukka Honkola T-110.456

SYNCHRONIZATION IS BAD, BUT IF YOU MUST (S9329) Olivier Giroux, Distinguished Architect, ISO

Practical Concerns for Scalable Synchronization Josh Triplett May 10, 2006 The basic problem

Chapter 14 Parallel Programming Introduction Synchronization Semaphores

ARES /RACES Par+cipa+on with Homeland Security Agencies in

Concurrency Control Ensuring Isolation 354 Concurrency control Concurrency To increase

Condition Synchronization 1 Synchronization Now that you have seen locks, is that all there is?

Concurrency, Races & Synchronization CS 450: Operating Systems - PowerPoint PPT Presentation

Concurrency, Races & Synchronization CS 450: Operating Systems Michael Lee <lee@iit.edu> Agenda - Concurrency: what, why, how - Concurrency-related problems - Locks & Locking strategies - Concurrent programming with semaphores

Concurrency, Races &amp; Synchronization CS 450: Operating Systems Michael Lee

Concurrency: Mutual Exclusion and Synchronization Chapter 5 1 Concurrency Concurrency arises

Concurrency: Mutual Exclusion and Synchronization Chapter 5 1 Concurrency Multiple

Data Races are Bad Race: two threads access memory without synchronization and at least one is a

Concurrency What is concurrency? In computer science, concurrency is a property of systems which

Concurrency: Common Errors Races and Starvation Prof. Patrick G. Bridges 1 University of

Parallel Programming and Heterogeneous Computing Shared-Memory: Concurrency &amp; Synchronization

Time Synchronization and Logical Clocks CS 240: Computing Systems and Concurrency Lecture 5

Concurrency &amp; Synchronization Nima Honarmand Fall 2017 :: CSE 306 Agenda Review basic

ECE 650 Systems Programming &amp; Engineering Spring 2018 Concurrency and Synchronization Tyler

Synchronization, Critical Sections and Concurrency CS 111 Operating Systems Peter Reiher

Multi-Core in JAVA/JVM Tommi Zetterman Concurrency Prior to Java 5: Synchronization and Threads

Real Time Programming: Concepts Radek Pel anek Concurrency Communication and Synchronization

The Synchronization Toolbox The Synchronization Toolbox Mutual Exclusion Mutual Exclusion Race

Basic Synchronization Principles Encourage Concurrency No widely-accepted concurrent

Concurrency On multiprocessors, several threads can execute simultaneously, one on each

Asynchronous Programming Model for Concurrency concurrency Concurrency is when two or more tasks

Content Synchronization Content Synchronization March 2nd 2005 Jukka Honkola T-110.456

SYNCHRONIZATION IS BAD, BUT IF YOU MUST (S9329) Olivier Giroux, Distinguished Architect, ISO

Practical Concerns for Scalable Synchronization Josh Triplett May 10, 2006 The basic problem

Chapter 14 Parallel Programming Introduction Synchronization Semaphores

ARES /RACES Par+cipa+on with Homeland Security Agencies in

Concurrency Control Ensuring Isolation 354 Concurrency control Concurrency To increase

Condition Synchronization 1 Synchronization Now that you have seen locks, is that all there is?

Concurrency, Races & Synchronization CS 450: Operating Systems Michael Lee

Parallel Programming and Heterogeneous Computing Shared-Memory: Concurrency & Synchronization

Concurrency & Synchronization Nima Honarmand Fall 2017 :: CSE 306 Agenda Review basic

ECE 650 Systems Programming & Engineering Spring 2018 Concurrency and Synchronization Tyler