1 last time
play

1 last time reordering: processors and compilers avoiding - PowerPoint PPT Presentation

1 last time reordering: processors and compilers avoiding reordering: special instructions, compiler directives memory fence idea: everything before fence, then everything after cache coherency (keeping caches in sync) baseline idea:


  1. LockMutex(Mutex *m) { UnlockMutex(Mutex *m) { make current thread not runnable /* xv6: myproc()->state = SLEEPING; */ /* xv6: myproc()->state = RUNNABLE; */ m->lock_taken = false ; } else { m->lock_taken = true ; } } subtle: what if UnlockMutex runs on another core between these lines? scheduler on another core might want to switch to it before it saves registers issue to handle when marking threads not runnable for any reason one possible implementation need to work with scheduler to prevent this if (m->wait_queue not empty) { remove a thread from m->wait_queue make that thread runnable } else { } UnlockSpinlock(&m->guard_spinlock); } LockSpinlock(&m->guard_spinlock); run scheduler UnlockSpinlock(&m->guard_spinlock); list of threads that discovered lock is taken SpinLock guard_spinlock; bool lock_taken = false ; WaitQueue wait_queue; }; spinlock protecting lock_taken and wait_queue only held for very short amount of time (compared to mutex itself) tracks whether any thread has locked and not unlocked and are waiting for it be free struct Mutex { these threads are not runnable instead of setting lock_taken to false choose thread to hand-ofg lock to LockSpinlock(&m->guard_spinlock); if (m->lock_taken) { put current thread on m->wait_queue UnlockSpinlock(&m->guard_spinlock); 15

  2. these threads are not runnable one possible implementation need to work with scheduler to prevent this UnlockSpinlock(&m->guard_spinlock); } } subtle: what if UnlockMutex runs on another core between these lines? scheduler on another core might want to switch to it before it saves registers issue to handle when marking threads not runnable for any reason LockSpinlock(&m->guard_spinlock); struct Mutex { if (m->wait_queue not empty) { remove a thread from m->wait_queue make that thread runnable } else { } UnlockSpinlock(&m->guard_spinlock); } } else { run scheduler UnlockSpinlock(&m->guard_spinlock); list of threads that discovered lock is taken SpinLock guard_spinlock; bool lock_taken = false ; WaitQueue wait_queue; }; spinlock protecting lock_taken and wait_queue only held for very short amount of time (compared to mutex itself) tracks whether any thread has locked and not unlocked 15 and are waiting for it be free instead of setting lock_taken to false choose thread to hand-ofg lock to LockSpinlock(&m->guard_spinlock); if (m->lock_taken) { put current thread on m->wait_queue LockMutex(Mutex *m) { UnlockMutex(Mutex *m) { make current thread not runnable /* xv6: myproc()->state = SLEEPING; */ /* xv6: myproc()->state = RUNNABLE; */ m->lock_taken = false ; m->lock_taken = true ;

  3. these threads are not runnable one possible implementation need to work with scheduler to prevent this UnlockSpinlock(&m->guard_spinlock); } } subtle: what if UnlockMutex runs on another core between these lines? scheduler on another core might want to switch to it before it saves registers issue to handle when marking threads not runnable for any reason LockSpinlock(&m->guard_spinlock); struct Mutex { if (m->wait_queue not empty) { remove a thread from m->wait_queue make that thread runnable } else { } UnlockSpinlock(&m->guard_spinlock); } } else { run scheduler UnlockSpinlock(&m->guard_spinlock); list of threads that discovered lock is taken SpinLock guard_spinlock; bool lock_taken = false ; WaitQueue wait_queue; }; spinlock protecting lock_taken and wait_queue only held for very short amount of time (compared to mutex itself) tracks whether any thread has locked and not unlocked 15 and are waiting for it be free instead of setting lock_taken to false choose thread to hand-ofg lock to LockSpinlock(&m->guard_spinlock); if (m->lock_taken) { put current thread on m->wait_queue LockMutex(Mutex *m) { UnlockMutex(Mutex *m) { make current thread not runnable /* xv6: myproc()->state = SLEEPING; */ /* xv6: myproc()->state = RUNNABLE; */ m->lock_taken = false ; m->lock_taken = true ;

  4. these threads are not runnable one possible implementation need to work with scheduler to prevent this UnlockSpinlock(&m->guard_spinlock); } } subtle: what if UnlockMutex runs on another core between these lines? scheduler on another core might want to switch to it before it saves registers issue to handle when marking threads not runnable for any reason LockSpinlock(&m->guard_spinlock); UnlockSpinlock(&m->guard_spinlock); if (m->wait_queue not empty) { remove a thread from m->wait_queue make that thread runnable } else { } UnlockSpinlock(&m->guard_spinlock); } struct Mutex { run scheduler 15 }; SpinLock guard_spinlock; bool lock_taken = false ; put current thread on m->wait_queue if (m->lock_taken) { LockSpinlock(&m->guard_spinlock); choose thread to hand-ofg lock to instead of setting lock_taken to false and are waiting for it be free WaitQueue wait_queue; list of threads that discovered lock is taken tracks whether any thread has locked and not unlocked only held for very short amount of time (compared to mutex itself) spinlock protecting lock_taken and wait_queue LockMutex(Mutex *m) { UnlockMutex(Mutex *m) { make current thread not runnable /* xv6: myproc()->state = SLEEPING; */ /* xv6: myproc()->state = RUNNABLE; */ m->lock_taken = false ; } else { m->lock_taken = true ;

  5. these threads are not runnable one possible implementation need to work with scheduler to prevent this UnlockSpinlock(&m->guard_spinlock); } } subtle: what if UnlockMutex runs on another core between these lines? scheduler on another core might want to switch to it before it saves registers issue to handle when marking threads not runnable for any reason LockSpinlock(&m->guard_spinlock); UnlockSpinlock(&m->guard_spinlock); if (m->wait_queue not empty) { remove a thread from m->wait_queue make that thread runnable } else { } UnlockSpinlock(&m->guard_spinlock); } struct Mutex { run scheduler 15 }; SpinLock guard_spinlock; bool lock_taken = false ; put current thread on m->wait_queue if (m->lock_taken) { LockSpinlock(&m->guard_spinlock); choose thread to hand-ofg lock to instead of setting lock_taken to false and are waiting for it be free WaitQueue wait_queue; list of threads that discovered lock is taken tracks whether any thread has locked and not unlocked only held for very short amount of time (compared to mutex itself) spinlock protecting lock_taken and wait_queue LockMutex(Mutex *m) { UnlockMutex(Mutex *m) { make current thread not runnable /* xv6: myproc()->state = SLEEPING; */ /* xv6: myproc()->state = RUNNABLE; */ m->lock_taken = false ; } else { m->lock_taken = true ;

  6. mutex and scheduler subtly thread A set runnable Linux soln.: track that/check if thread A is still on core 0 xv6 soln.: hold scheduler lock until thread A saves registers … …fjnally saving registers … thread A runs scheduler …with old verison of registers scheduler switches to A run scheduler dequeue thread A core 0 (thread A) start UnlockMutex release spinlock thread A set not runnable enqueue thread A discover lock taken acquire spinlock start LockMutex core 2 core 1 (thread B) 16

  7. mutex and scheduler subtly thread A set runnable Linux soln.: track that/check if thread A is still on core 0 xv6 soln.: hold scheduler lock until thread A saves registers … …fjnally saving registers … thread A runs scheduler …with old verison of registers scheduler switches to A run scheduler dequeue thread A core 0 (thread A) start UnlockMutex release spinlock thread A set not runnable enqueue thread A discover lock taken acquire spinlock start LockMutex core 2 core 1 (thread B) 16

  8. mutex effjciency ‘normal’ mutex uncontended case: lock: acquire + release spinlock, see lock is free unlock: acquire + release spinlock, see queue is empty not much slower than spinlock 17

  9. recall: pthread mutex #include <pthread.h> pthread_mutex_t some_lock; pthread_mutex_init(&some_lock, NULL); // or: pthread_mutex_t some_lock = PTHREAD_MUTEX_INITIALIZER; ... pthread_mutex_lock(&some_lock); ... pthread_mutex_unlock(&some_lock); pthread_mutex_destroy(&some_lock); 18

  10. pthread mutexes: addt’l features mutex attributes ( pthread_mutexattr_t ) allow: (reference: man pthread.h ) error-checking mutexes locking mutex twice in same thread? unlocking already unlocked mutex? … mutexes shared between processes otherwise: must be only threads of same process (unanswered question: where to store mutex?) … 19

  11. POSIX mutex restrictions pthread_mutex rule: unlock from same thread you lock in implementation I gave before — not a problem …but there other ways to implement mutexes e.g. might involve comparing with “holding” thread ID 20

  12. example: producer/consumer producer bufger consumer shared bufger (queue) of fjxed size one or more producers inserts into queue one or more consumers removes from queue producer(s) and consumer(s) don’t work in lockstep (might need to wait for each other to catch up) example: C compiler preprocessor compiler assembler linker 21

  13. example: producer/consumer producer bufger consumer shared bufger (queue) of fjxed size one or more producers inserts into queue one or more consumers removes from queue producer(s) and consumer(s) don’t work in lockstep (might need to wait for each other to catch up) example: C compiler preprocessor compiler assembler linker 21

  14. example: producer/consumer producer bufger consumer shared bufger (queue) of fjxed size one or more producers inserts into queue one or more consumers removes from queue producer(s) and consumer(s) don’t work in lockstep (might need to wait for each other to catch up) example: C compiler 21 preprocessor → compiler → assembler → linker

  15. monitors/condition variables locks for mutual exclusion condition variables for waiting for event operations: wait (for event); signal/broadcast (that event happened) related data structures monitor = lock + 0 or more condition variables + shared data Java: every object is a monitor (has instance variables, built-in lock, cond. var) pthreads: build your own: provides you locks + condition variables 22

  16. monitor idea lock must be acquired about shared data condition to be true threads waiting for threads waiting for lock any part of monitor’s stufg before accessing a monitor lock operation2(…) operation1(…) … condvar 2 condvar 1 shared data 23

  17. monitor idea lock must be acquired about shared data condition to be true threads waiting for threads waiting for lock any part of monitor’s stufg before accessing a monitor lock operation2(…) operation1(…) … condvar 2 condvar 1 shared data 23

  18. monitor idea lock must be acquired about shared data condition to be true threads waiting for threads waiting for lock any part of monitor’s stufg before accessing a monitor lock operation2(…) operation1(…) … condvar 2 condvar 1 shared data 23

  19. monitor idea lock must be acquired about shared data condition to be true threads waiting for threads waiting for lock any part of monitor’s stufg before accessing a monitor lock operation2(…) operation1(…) … condvar 2 condvar 1 shared data 23

  20. condvar operations condvar operations: to start waiting for lock any one thread removed from cv queue to start waiting for lock all threads removed from cv queue calling thread starts waiting unlock lock — allow thread from queue to go Signal(cv) — remove one from condvar queue Broadcast(cv) — remove all from condvar queue …and reacquire lock before returning Wait(cv, lock) — unlock lock, add current thread to cv queue about shared data lock condition to be true threads waiting for threads waiting for lock a monitor operation2(…) operation1(…) … condvar 2 condvar 1 shared data 24

  21. condvar operations condvar operations: to start waiting for lock any one thread removed from cv queue to start waiting for lock all threads removed from cv queue calling thread starts waiting unlock lock — allow thread from queue to go Signal(cv) — remove one from condvar queue Broadcast(cv) — remove all from condvar queue …and reacquire lock before returning Wait(cv, lock) — unlock lock, add current thread to cv queue about shared data lock condition to be true threads waiting for threads waiting for lock a monitor operation2(…) operation1(…) … condvar 2 condvar 1 shared data 24

  22. condvar operations condvar operations: to start waiting for lock any one thread removed from cv queue to start waiting for lock all threads removed from cv queue calling thread starts waiting unlock lock — allow thread from queue to go Signal(cv) — remove one from condvar queue Broadcast(cv) — remove all from condvar queue Wait(cv, lock) — unlock lock, add current thread to cv queue about shared data lock condition to be true threads waiting for threads waiting for lock a monitor operation2(…) operation1(…) … condvar 2 condvar 1 shared data 24 …and reacquire lock before returning

  23. condvar operations condvar operations: to start waiting for lock any one thread removed from cv queue to start waiting for lock all threads removed from cv queue calling thread starts waiting unlock lock — allow thread from queue to go Signal(cv) — remove one from condvar queue Broadcast(cv) — remove all from condvar queue …and reacquire lock before returning Wait(cv, lock) — unlock lock, add current thread to cv queue about shared data lock condition to be true threads waiting for threads waiting for lock a monitor operation2(…) operation1(…) … condvar 2 condvar 1 shared data 24

  24. condvar operations condvar operations: to start waiting for lock any one thread removed from cv queue to start waiting for lock all threads removed from cv queue calling thread starts waiting unlock lock — allow thread from queue to go Signal(cv) — remove one from condvar queue Broadcast(cv) — remove all from condvar queue …and reacquire lock before returning Wait(cv, lock) — unlock lock, add current thread to cv queue about shared data lock condition to be true threads waiting for threads waiting for lock a monitor operation2(…) operation1(…) … condvar 2 condvar 1 shared data 24

  25. pthread cv usage // MISSING: init calls, etc. (once we unlock the lock) allow all waiters to proceed so wait, releasing lock… (fjnished can’t change while we have lock) know we need to wait (why a loop? we’ll explain later) check whether we need to wait at all reading or writing finished acquire lock before } pthread_mutex_unlock(&lock); pthread_cond_broadcast(&finished_cv); pthread_mutex_lock(&lock); void Finish() { } pthread_mutex_unlock(&lock); } pthread_cond_wait(&finished_cv, &lock); while (!finished) { pthread_mutex_lock(&lock); void WaitForFinished() { // to wait for 'finished' to be true pthread_cond_t finished_cv; // data, only accessed with after acquiring lock bool finished; pthread_mutex_t lock; 25 finished = true ;

  26. pthread cv usage // MISSING: init calls, etc. (once we unlock the lock) allow all waiters to proceed so wait, releasing lock… (fjnished can’t change while we have lock) know we need to wait (why a loop? we’ll explain later) check whether we need to wait at all reading or writing finished acquire lock before } pthread_mutex_unlock(&lock); pthread_cond_broadcast(&finished_cv); pthread_mutex_lock(&lock); void Finish() { } pthread_mutex_unlock(&lock); } pthread_cond_wait(&finished_cv, &lock); while (!finished) { pthread_mutex_lock(&lock); void WaitForFinished() { // to wait for 'finished' to be true pthread_cond_t finished_cv; // data, only accessed with after acquiring lock bool finished; pthread_mutex_t lock; 25 finished = true ;

  27. pthread cv usage // MISSING: init calls, etc. (once we unlock the lock) allow all waiters to proceed so wait, releasing lock… (fjnished can’t change while we have lock) know we need to wait (why a loop? we’ll explain later) check whether we need to wait at all reading or writing finished acquire lock before } pthread_mutex_unlock(&lock); pthread_cond_broadcast(&finished_cv); pthread_mutex_lock(&lock); void Finish() { } pthread_mutex_unlock(&lock); } pthread_cond_wait(&finished_cv, &lock); while (!finished) { pthread_mutex_lock(&lock); void WaitForFinished() { // to wait for 'finished' to be true pthread_cond_t finished_cv; // data, only accessed with after acquiring lock bool finished; pthread_mutex_t lock; 25 finished = true ;

  28. pthread cv usage // MISSING: init calls, etc. (once we unlock the lock) allow all waiters to proceed so wait, releasing lock… (fjnished can’t change while we have lock) know we need to wait (why a loop? we’ll explain later) check whether we need to wait at all reading or writing finished acquire lock before } pthread_mutex_unlock(&lock); pthread_cond_broadcast(&finished_cv); pthread_mutex_lock(&lock); void Finish() { } pthread_mutex_unlock(&lock); } pthread_cond_wait(&finished_cv, &lock); while (!finished) { pthread_mutex_lock(&lock); void WaitForFinished() { // to wait for 'finished' to be true pthread_cond_t finished_cv; // data, only accessed with after acquiring lock bool finished; pthread_mutex_t lock; 25 finished = true ;

  29. pthread cv usage // MISSING: init calls, etc. (once we unlock the lock) allow all waiters to proceed so wait, releasing lock… (fjnished can’t change while we have lock) know we need to wait (why a loop? we’ll explain later) check whether we need to wait at all reading or writing finished acquire lock before } pthread_mutex_unlock(&lock); pthread_cond_broadcast(&finished_cv); pthread_mutex_lock(&lock); void Finish() { } pthread_mutex_unlock(&lock); } pthread_cond_wait(&finished_cv, &lock); while (!finished) { pthread_mutex_lock(&lock); void WaitForFinished() { // to wait for 'finished' to be true pthread_cond_t finished_cv; // data, only accessed with after acquiring lock bool finished; pthread_mutex_t lock; 25 finished = true ;

  30. WaitForFinish timeline 1 finished = true mutex_unlock(&lock) (fjnished now true, so return) while (!finished) ... (done waiting for lock) mutex_unlock(&lock) (start waiting for lock) (done waiting for cv) cond_broadcast(&finished_cv) (done waiting for lock) WaitForFinish thread (start waiting for cv) cond_wait(&finished_cv, &lock); while (!finished) ... (start waiting for lock) mutex_lock(&lock) (thread has lock) mutex_lock(&lock) Finish thread 26

  31. WaitForFinish timeline 2 WaitForFinish thread Finish thread mutex_lock(&lock) finished = true cond_broadcast(&finished_cv) mutex_unlock(&lock) mutex_lock(&lock) while (!finished) ... (fjnished now true, so return) mutex_unlock(&lock) 27

  32. why the loop while (!finished) { pthread_cond_wait(&finished_cv, &lock); } we only broadcast if finished is true so why check finished afterwards? pthread_cond_wait manual page: “Spurious wakeups ... may occur.” spurious wakeup = wait returns even though nothing happened 28

  33. why the loop while (!finished) { pthread_cond_wait(&finished_cv, &lock); } we only broadcast if finished is true so why check finished afterwards? pthread_cond_wait manual page: “Spurious wakeups ... may occur.” spurious wakeup = wait returns even though nothing happened 28

  34. unbounded bufger producer/consumer waiting for Produce() …unlock/start wait …empty? yes …lock Consume() Thread 3 Thread 2 Thread 1 data_ready return Consume() …unlock …dequeue …empty? no lock …unlock stop wait …signal …enqueue …lock Produce() …lock …enqueue …empty? yes data_ready not done by pthreads, Java, … called “Hoare scheduling” signalled thread gets lock next alternate design: gaurenteed to hold lock next in pthreads: signalled thread not lock waiting for lock waiting for waiting for …signal …unlock/start wait …empty? yes return …lock …unlock …dequeue …empty? no lock …unlock stop wait …unlock/start wait …lock pthread_mutex_t lock; pthread_cond_wait(&data_ready, &lock); simulatenously en/dequeue? otherwise: what if two threads without acquiring lock rule: never touch buffer } return item; pthread_mutex_unlock(&lock); item = buffer.dequeue(); } while (buffer.empty()) { (both reallocate array?) pthread_mutex_lock(&lock); Consume() { } pthread_mutex_unlock(&lock); pthread_cond_signal(&data_ready); buffer.enqueue(item); pthread_mutex_lock(&lock); Produce(item) { UnboundedQueue buffer; pthread_cond_t data_ready; (both use same array/linked list entry?) check if empty Consume() …enqueue Thread 2 Thread 1 return …unlock …dequeue …empty? no …lock Consume() …unlock …signal …lock if so, dequeue Produce() Thread 2 Thread 1 2+ iterations: spurious wakeup or …? 1 iteration: Produce() signalled, probably 0 iterations: Produce() called before Consume() if any are waiting wake one Consume thread other threads can not dequeue here okay because have lock 29

  35. unbounded bufger producer/consumer waiting for Produce() …unlock/start wait …empty? yes …lock Consume() Thread 3 Thread 2 Thread 1 data_ready return Consume() …unlock …dequeue …empty? no lock …unlock stop wait …signal …enqueue …lock Produce() …lock …enqueue …empty? yes data_ready not done by pthreads, Java, … called “Hoare scheduling” signalled thread gets lock next alternate design: gaurenteed to hold lock next in pthreads: signalled thread not lock waiting for lock waiting for waiting for …signal …unlock/start wait …empty? yes return …lock …unlock …dequeue …empty? no lock …unlock stop wait …unlock/start wait …lock pthread_mutex_t lock; pthread_cond_wait(&data_ready, &lock); simulatenously en/dequeue? otherwise: what if two threads without acquiring lock rule: never touch buffer } return item; pthread_mutex_unlock(&lock); item = buffer.dequeue(); } while (buffer.empty()) { (both reallocate array?) pthread_mutex_lock(&lock); Consume() { } pthread_mutex_unlock(&lock); pthread_cond_signal(&data_ready); buffer.enqueue(item); pthread_mutex_lock(&lock); Produce(item) { UnboundedQueue buffer; pthread_cond_t data_ready; (both use same array/linked list entry?) check if empty Consume() …enqueue Thread 2 Thread 1 return …unlock …dequeue …empty? no …lock Consume() …unlock …signal …lock if so, dequeue Produce() Thread 2 Thread 1 2+ iterations: spurious wakeup or …? 1 iteration: Produce() signalled, probably 0 iterations: Produce() called before Consume() if any are waiting wake one Consume thread other threads can not dequeue here okay because have lock 29

  36. unbounded bufger producer/consumer waiting for Produce() …unlock/start wait …empty? yes …lock Consume() Thread 3 Thread 2 Thread 1 data_ready return Consume() …unlock …dequeue …empty? no lock …unlock stop wait …signal …enqueue …lock Produce() …lock …enqueue …empty? yes data_ready not done by pthreads, Java, … called “Hoare scheduling” signalled thread gets lock next alternate design: gaurenteed to hold lock next in pthreads: signalled thread not lock waiting for lock waiting for waiting for …signal …unlock/start wait …empty? yes return …lock …unlock …dequeue …empty? no lock …unlock stop wait …unlock/start wait …lock pthread_mutex_t lock; pthread_cond_wait(&data_ready, &lock); simulatenously en/dequeue? otherwise: what if two threads without acquiring lock rule: never touch buffer } return item; pthread_mutex_unlock(&lock); item = buffer.dequeue(); } while (buffer.empty()) { (both reallocate array?) pthread_mutex_lock(&lock); Consume() { } pthread_mutex_unlock(&lock); pthread_cond_signal(&data_ready); buffer.enqueue(item); pthread_mutex_lock(&lock); Produce(item) { UnboundedQueue buffer; pthread_cond_t data_ready; (both use same array/linked list entry?) check if empty Consume() …enqueue Thread 2 Thread 1 return …unlock …dequeue …empty? no …lock Consume() …unlock …signal …lock if so, dequeue Produce() Thread 2 Thread 1 2+ iterations: spurious wakeup or …? 1 iteration: Produce() signalled, probably 0 iterations: Produce() called before Consume() if any are waiting wake one Consume thread other threads can not dequeue here okay because have lock 29

  37. unbounded bufger producer/consumer waiting for Produce() …unlock/start wait …empty? yes …lock Consume() Thread 3 Thread 2 Thread 1 data_ready return Consume() …unlock …dequeue …empty? no lock …unlock stop wait …signal …enqueue …lock Produce() …lock …enqueue …empty? yes data_ready not done by pthreads, Java, … called “Hoare scheduling” signalled thread gets lock next alternate design: gaurenteed to hold lock next in pthreads: signalled thread not lock waiting for lock waiting for waiting for …signal …unlock/start wait …empty? yes return …lock …unlock …dequeue …empty? no lock …unlock stop wait …unlock/start wait …lock pthread_mutex_t lock; pthread_cond_wait(&data_ready, &lock); simulatenously en/dequeue? otherwise: what if two threads without acquiring lock rule: never touch buffer } return item; pthread_mutex_unlock(&lock); item = buffer.dequeue(); } while (buffer.empty()) { (both reallocate array?) pthread_mutex_lock(&lock); Consume() { } pthread_mutex_unlock(&lock); pthread_cond_signal(&data_ready); buffer.enqueue(item); pthread_mutex_lock(&lock); Produce(item) { UnboundedQueue buffer; pthread_cond_t data_ready; (both use same array/linked list entry?) check if empty Consume() …enqueue Thread 2 Thread 1 return …unlock …dequeue …empty? no …lock Consume() …unlock …signal …lock if so, dequeue Produce() Thread 2 Thread 1 2+ iterations: spurious wakeup or …? 1 iteration: Produce() signalled, probably 0 iterations: Produce() called before Consume() if any are waiting wake one Consume thread other threads can not dequeue here okay because have lock 29

  38. unbounded bufger producer/consumer waiting for Produce() …unlock/start wait …empty? yes …lock Consume() Thread 3 Thread 2 Thread 1 data_ready return Consume() …unlock …dequeue …empty? no lock …unlock stop wait …signal …enqueue …lock Produce() …lock …enqueue …empty? yes data_ready not done by pthreads, Java, … called “Hoare scheduling” signalled thread gets lock next alternate design: gaurenteed to hold lock next in pthreads: signalled thread not lock waiting for lock waiting for waiting for …signal …unlock/start wait …empty? yes return …lock …unlock …dequeue …empty? no lock …unlock stop wait …unlock/start wait …lock pthread_mutex_t lock; pthread_cond_wait(&data_ready, &lock); simulatenously en/dequeue? otherwise: what if two threads without acquiring lock rule: never touch buffer } return item; pthread_mutex_unlock(&lock); item = buffer.dequeue(); } while (buffer.empty()) { (both reallocate array?) pthread_mutex_lock(&lock); Consume() { } pthread_mutex_unlock(&lock); pthread_cond_signal(&data_ready); buffer.enqueue(item); pthread_mutex_lock(&lock); Produce(item) { UnboundedQueue buffer; pthread_cond_t data_ready; (both use same array/linked list entry?) check if empty Consume() …enqueue Thread 2 Thread 1 return …unlock …dequeue …empty? no …lock Consume() …unlock …signal …lock if so, dequeue Produce() Thread 2 Thread 1 2+ iterations: spurious wakeup or …? 1 iteration: Produce() signalled, probably 0 iterations: Produce() called before Consume() if any are waiting wake one Consume thread other threads can not dequeue here okay because have lock 29

  39. unbounded bufger producer/consumer waiting for Produce() …unlock/start wait …empty? yes …lock Consume() Thread 3 Thread 2 Thread 1 data_ready return Consume() …unlock …dequeue …empty? no lock …unlock stop wait …signal …enqueue …lock Produce() …lock …enqueue …empty? yes data_ready not done by pthreads, Java, … called “Hoare scheduling” signalled thread gets lock next alternate design: gaurenteed to hold lock next in pthreads: signalled thread not lock waiting for lock waiting for waiting for …signal …unlock/start wait …empty? yes return …lock …unlock …dequeue …empty? no lock …unlock stop wait …unlock/start wait …lock pthread_mutex_t lock; pthread_cond_wait(&data_ready, &lock); simulatenously en/dequeue? otherwise: what if two threads without acquiring lock rule: never touch buffer } return item; pthread_mutex_unlock(&lock); item = buffer.dequeue(); } while (buffer.empty()) { (both reallocate array?) pthread_mutex_lock(&lock); Consume() { } pthread_mutex_unlock(&lock); pthread_cond_signal(&data_ready); buffer.enqueue(item); pthread_mutex_lock(&lock); Produce(item) { UnboundedQueue buffer; pthread_cond_t data_ready; (both use same array/linked list entry?) check if empty Consume() …enqueue Thread 2 Thread 1 return …unlock …dequeue …empty? no …lock Consume() …unlock …signal …lock if so, dequeue Produce() Thread 2 Thread 1 2+ iterations: spurious wakeup or …? 1 iteration: Produce() signalled, probably 0 iterations: Produce() called before Consume() if any are waiting wake one Consume thread other threads can not dequeue here okay because have lock 29

  40. unbounded bufger producer/consumer waiting for Produce() …unlock/start wait …empty? yes …lock Consume() Thread 3 Thread 2 Thread 1 data_ready return Consume() …unlock …dequeue …empty? no lock …unlock stop wait …signal …enqueue …lock Produce() …lock …enqueue …empty? yes data_ready not done by pthreads, Java, … called “Hoare scheduling” signalled thread gets lock next alternate design: gaurenteed to hold lock next in pthreads: signalled thread not lock waiting for lock waiting for waiting for …signal …unlock/start wait …empty? yes return …lock …unlock …dequeue …empty? no lock …unlock stop wait …unlock/start wait …lock pthread_mutex_t lock; pthread_cond_wait(&data_ready, &lock); simulatenously en/dequeue? otherwise: what if two threads without acquiring lock rule: never touch buffer } return item; pthread_mutex_unlock(&lock); item = buffer.dequeue(); } while (buffer.empty()) { (both reallocate array?) pthread_mutex_lock(&lock); Consume() { } pthread_mutex_unlock(&lock); pthread_cond_signal(&data_ready); buffer.enqueue(item); pthread_mutex_lock(&lock); Produce(item) { UnboundedQueue buffer; pthread_cond_t data_ready; (both use same array/linked list entry?) check if empty Consume() …enqueue Thread 2 Thread 1 return …unlock …dequeue …empty? no …lock Consume() …unlock …signal …lock if so, dequeue Produce() Thread 2 Thread 1 2+ iterations: spurious wakeup or …? 1 iteration: Produce() signalled, probably 0 iterations: Produce() called before Consume() if any are waiting wake one Consume thread other threads can not dequeue here okay because have lock 29

  41. unbounded bufger producer/consumer waiting for Produce() …unlock/start wait …empty? yes …lock Consume() Thread 3 Thread 2 Thread 1 data_ready return Consume() …unlock …dequeue …empty? no lock …unlock stop wait …signal …enqueue …lock Produce() …lock …enqueue …empty? yes data_ready not done by pthreads, Java, … called “Hoare scheduling” signalled thread gets lock next alternate design: gaurenteed to hold lock next in pthreads: signalled thread not lock waiting for lock waiting for waiting for …signal …unlock/start wait …empty? yes return …lock …unlock …dequeue …empty? no lock …unlock stop wait …unlock/start wait …lock pthread_mutex_t lock; pthread_cond_wait(&data_ready, &lock); simulatenously en/dequeue? otherwise: what if two threads without acquiring lock rule: never touch buffer } return item; pthread_mutex_unlock(&lock); item = buffer.dequeue(); } while (buffer.empty()) { (both reallocate array?) pthread_mutex_lock(&lock); Consume() { } pthread_mutex_unlock(&lock); pthread_cond_signal(&data_ready); buffer.enqueue(item); pthread_mutex_lock(&lock); Produce(item) { UnboundedQueue buffer; pthread_cond_t data_ready; (both use same array/linked list entry?) check if empty Consume() …enqueue Thread 2 Thread 1 return …unlock …dequeue …empty? no …lock Consume() …unlock …signal …lock if so, dequeue Produce() Thread 2 Thread 1 2+ iterations: spurious wakeup or …? 1 iteration: Produce() signalled, probably 0 iterations: Produce() called before Consume() if any are waiting wake one Consume thread other threads can not dequeue here okay because have lock 29

  42. Hoare versus Mesa monitors Hoare-style monitors signal ‘hands ofg’ lock to awoken thread Mesa-style monitors any eligible thread gets lock next (maybe some other idea of priority?) every current threading library I know of does Mesa-style 30

  43. bounded bufger producer/consumer item = buffer.dequeue(); (just more “spurious wakeups”) and use broadcast with ‘combined’ condvar ready data_ready and space_ready correct but slow to replace (just more “spurious wakeups”) pthread_cond_broadcast(&space_ready); correct (but slow?) to replace with: } return item; pthread_mutex_unlock(&lock); pthread_cond_signal(&space_ready); } pthread_mutex_t lock; pthread_cond_wait(&data_ready, &lock); while (buffer.empty()) { pthread_mutex_lock(&lock); Consume() { } pthread_mutex_unlock(&lock); pthread_cond_signal(&data_ready); buffer.enqueue(item); while (buffer.full()) { pthread_cond_wait(&space_ready, &lock); } pthread_mutex_lock(&lock); Produce(item) { BoundedQueue buffer; pthread_cond_t data_ready; pthread_cond_t space_ready; 31

  44. bounded bufger producer/consumer item = buffer.dequeue(); (just more “spurious wakeups”) and use broadcast with ‘combined’ condvar ready data_ready and space_ready correct but slow to replace (just more “spurious wakeups”) pthread_cond_broadcast(&space_ready); correct (but slow?) to replace with: } return item; pthread_mutex_unlock(&lock); pthread_cond_signal(&space_ready); } pthread_mutex_t lock; pthread_cond_wait(&data_ready, &lock); while (buffer.empty()) { pthread_mutex_lock(&lock); Consume() { } pthread_mutex_unlock(&lock); pthread_cond_signal(&data_ready); buffer.enqueue(item); while (buffer.full()) { pthread_cond_wait(&space_ready, &lock); } pthread_mutex_lock(&lock); Produce(item) { BoundedQueue buffer; pthread_cond_t data_ready; pthread_cond_t space_ready; 31

  45. bounded bufger producer/consumer item = buffer.dequeue(); (just more “spurious wakeups”) and use broadcast with ‘combined’ condvar ready data_ready and space_ready correct but slow to replace (just more “spurious wakeups”) pthread_cond_broadcast(&space_ready); correct (but slow?) to replace with: } return item; pthread_mutex_unlock(&lock); pthread_cond_signal(&space_ready); } pthread_mutex_t lock; pthread_cond_wait(&data_ready, &lock); while (buffer.empty()) { pthread_mutex_lock(&lock); Consume() { } pthread_mutex_unlock(&lock); pthread_cond_signal(&data_ready); buffer.enqueue(item); while (buffer.full()) { pthread_cond_wait(&space_ready, &lock); } pthread_mutex_lock(&lock); Produce(item) { BoundedQueue buffer; pthread_cond_t data_ready; pthread_cond_t space_ready; 31

  46. bounded bufger producer/consumer item = buffer.dequeue(); (just more “spurious wakeups”) and use broadcast with ‘combined’ condvar ready data_ready and space_ready correct but slow to replace (just more “spurious wakeups”) pthread_cond_broadcast(&space_ready); correct (but slow?) to replace with: } return item; pthread_mutex_unlock(&lock); pthread_cond_signal(&space_ready); } pthread_mutex_t lock; pthread_cond_wait(&data_ready, &lock); while (buffer.empty()) { pthread_mutex_lock(&lock); Consume() { } pthread_mutex_unlock(&lock); pthread_cond_signal(&data_ready); buffer.enqueue(item); while (buffer.full()) { pthread_cond_wait(&space_ready, &lock); } pthread_mutex_lock(&lock); Produce(item) { BoundedQueue buffer; pthread_cond_t data_ready; pthread_cond_t space_ready; 31

  47. monitor pattern } pthread_mutex_unlock(&lock) ... } pthread_cond_broadcast(&condvar_for_C); if (set condition C) { 32 pthread_mutex_lock(&lock); pthread_cond_broadcast(&condvar_for_B); if (set condition B) { } pthread_cond_wait(&condvar_for_A, &lock); while (!condition A) { ... /* manipulate shared data, changing other conditions */ /* or signal, if only one thread cares */ /* or signal, if only one thread cares */

  48. monitors rules of thumb never touch shared data without holding the lock keep lock held for entire operation: verifying condition (e.g. bufger not full) up to and including manipulating data (e.g. adding to bufger) create condvar for every kind of scenario waited for correct but slow to… broadcast when just signal would work broadcast or signal when nothing changed use one condvar for multiple conditions 33 always write loop calling cond_wait to wait for condition X broadcast/signal condition variable every time you change X

  49. monitors rules of thumb never touch shared data without holding the lock keep lock held for entire operation: verifying condition (e.g. bufger not full) up to and including manipulating data (e.g. adding to bufger) create condvar for every kind of scenario waited for correct but slow to… broadcast when just signal would work broadcast or signal when nothing changed use one condvar for multiple conditions 33 always write loop calling cond_wait to wait for condition X broadcast/signal condition variable every time you change X

  50. mutex/cond var init/destroy pthread_mutex_t mutex; pthread_cond_t cv; pthread_mutex_init(&mutex, NULL); pthread_cond_init(&cv, NULL); // --OR-- pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t cv = PTHREAD_COND_INITIALIZER; // and when done: ... pthread_cond_destroy(&cv); pthread_mutex_destroy(&mutex); 34

  51. backup slides 35

  52. implementing locks: single core intuition: context switch only happens on interrupt timer expiration, I/O, etc. causes OS to run solution: disable them reenable on unlock x86 instructions: cli — disable interrupts sti — enable interrupts 36

  53. implementing locks: single core intuition: context switch only happens on interrupt timer expiration, I/O, etc. causes OS to run solution: disable them reenable on unlock x86 instructions: cli — disable interrupts sti — enable interrupts 36

  54. problem: user can hang the system: /* waits forever for (disabled) interrupt from disk IO finishing */ naive interrupt enable/disable (1) Lock() { disable interrupts } Unlock() { enable interrupts } Lock(some_lock); while ( true ) {} problem: can’t do I/O within lock Lock(some_lock); read from disk 37

  55. /* waits forever for (disabled) interrupt from disk IO finishing */ naive interrupt enable/disable (1) Lock() { disable interrupts } Unlock() { enable interrupts } Lock(some_lock); while ( true ) {} problem: can’t do I/O within lock Lock(some_lock); read from disk 37 problem: user can hang the system:

  56. naive interrupt enable/disable (1) Lock() { disable interrupts } Unlock() { enable interrupts } Lock(some_lock); while ( true ) {} problem: can’t do I/O within lock Lock(some_lock); read from disk 37 problem: user can hang the system: /* waits forever for (disabled) interrupt from disk IO finishing */

  57. /* interrupts enabled here?? */ naive interrupt enable/disable (2) if (no milk) { Unlock(milk_lock); } Unlock(store_lock); buy milk Lock(store_lock); Lock(milk_lock); Lock() { problem: nested locks } enable interrupts Unlock() { } disable interrupts 38

  58. /* interrupts enabled here?? */ naive interrupt enable/disable (2) if (no milk) { Unlock(milk_lock); } Unlock(store_lock); buy milk Lock(store_lock); Lock(milk_lock); Lock() { problem: nested locks } enable interrupts Unlock() { } disable interrupts 38

  59. /* interrupts enabled here?? */ naive interrupt enable/disable (2) if (no milk) { Unlock(milk_lock); } Unlock(store_lock); buy milk Lock(store_lock); Lock(milk_lock); Lock() { problem: nested locks } enable interrupts Unlock() { } disable interrupts 38

  60. naive interrupt enable/disable (2) if (no milk) { Unlock(milk_lock); } Unlock(store_lock); buy milk Lock(store_lock); Lock(milk_lock); Lock() { problem: nested locks } enable interrupts Unlock() { } disable interrupts 38 /* interrupts enabled here?? */

  61. xv6 interrupt disabling (1) ... } { popcli(); } 39 acquire( struct spinlock *lk) { pushcli(); // disable interrupts to avoid deadlock ... /* this part basically just for multicore */ release( struct spinlock *lk) ... /* this part basically just for multicore */

  62. xv6 push/popcli pushcli / popcli — need to be in pairs pushcli — disable interrupts if not already popcli — enable interrupts if corresponding pushcli disabled them don’t enable them if they were already disabled 40

  63. GCC: preventing reordering example (1) movl $1, note_from_alice ... jne .L2 testl %eax, %eax movl note_from_bob, %eax .L2: mfence Alice: void Alice() { } if (no_milk) {++milk;} do { __atomic_store(&note_from_alice, &one, __ATOMIC_SEQ_CST); int one = 1; 41 } while (__atomic_load_n(&note_from_bob, __ATOMIC_SEQ_CST));

  64. GCC: preventing reordering example (2) .L3: ... cmpl $0, no_milk jne .L3 // if (note_from_bob == 0) repeat fence cmpl $0, note_from_bob // make sure store is visible to other cores before loading mfence 42 void Alice() { movl $1, note_from_alice Alice: } if (no_milk) {++milk;} __atomic_thread_fence(__ATOMIC_SEQ_CST); do { note_from_alice = 1; } while (note_from_bob); // note_from_alice ← 1 // on x86: not needed on second + iteration of loop

  65. xv6 spinlock: debugging stufg if (!holding(lk)) } ... panic("release"); } // Record info about lock acquisition for debugging. ... panic("acquire") if (holding(lk)) ... 43 void acquire( struct spinlock *lk) { lk − >cpu = mycpu(); getcallerpcs(&lk, lk − >pcs); void release( struct spinlock *lk) { lk − >pcs[0] = 0; lk − >cpu = 0;

  66. xv6 spinlock: debugging stufg if (!holding(lk)) } ... panic("release"); } // Record info about lock acquisition for debugging. ... panic("acquire") if (holding(lk)) ... 43 void acquire( struct spinlock *lk) { lk − >cpu = mycpu(); getcallerpcs(&lk, lk − >pcs); void release( struct spinlock *lk) { lk − >pcs[0] = 0; lk − >cpu = 0;

  67. xv6 spinlock: debugging stufg if (!holding(lk)) } ... panic("release"); } // Record info about lock acquisition for debugging. ... panic("acquire") if (holding(lk)) ... 43 void acquire( struct spinlock *lk) { lk − >cpu = mycpu(); getcallerpcs(&lk, lk − >pcs); void release( struct spinlock *lk) { lk − >pcs[0] = 0; lk − >cpu = 0;

  68. xv6 spinlock: debugging stufg if (!holding(lk)) } ... panic("release"); } // Record info about lock acquisition for debugging. ... panic("acquire") if (holding(lk)) ... 43 void acquire( struct spinlock *lk) { lk − >cpu = mycpu(); getcallerpcs(&lk, lk − >pcs); void release( struct spinlock *lk) { lk − >pcs[0] = 0; lk − >cpu = 0;

  69. some common atomic operations (1) // x86: emulate with exchange test_and_set(address) { old_value = memory[address]; memory[address] = 1; return old_value != 0; // e.g. set ZF flag } // x86: xchg REGISTER, (ADDRESS) exchange(register, address) { temp = memory[address]; memory[address] = register; register = temp; } 44

  70. some common atomic operations (2) } } register = old_value; memory[address] += register; old_value = memory[address]; // x86: lock xaddl REGISTER, (ADDRESS) } // x86: clear ZF flag // x86: mov OLD_VALUE, %eax; lock cmpxchg NEW_VALUE, (ADDRESS) return false ; } else { // x86: set ZF flag return true ; memory[address] = new_value; if (memory[address] == old_value) { 45 compare − and − swap(address, old_value, new_value) { fetch − and − add(address, register) {

  71. common atomic operation pattern try to do operation, … detect if it failed if so, repeat atomic operation does “try and see if it failed” part 46

  72. fetch-and-add with CAS (1) implementation sketch: if not successful, repeat [ compare-and-swap ] try to change value at pointer from old to new compute in temporary value result of addition new fetch value from pointer old } } return false ; return true ; memory[address] = new_value; if (memory[address] == old_value) { 47 compare − and − swap(address, old_value, new_value) { } else { long my_fetch_and_add( long *pointer, long amount) { ... }

  73. fetch-and-add with CAS (2) long old_value; do { return old_value; } 48 long my_fetch_and_add( long *p, long amount) { old_value = *p; } while (!compare_and_swap(p, old_value, old_value + amount);

  74. exercise: append to singly-linked list ListNode is a singly-linked list assume: threads only append to list (no deletions, reordering) use compare-and-swap(pointer, old, new) : atomically change *pointer from old to new return true if successful return false (and change nothing) if *pointer is not old ... } 49 void append_to_list(ListNode *head, ListNode *new_last_node) {

  75. spinlock problems lock abstraction is not powerful enough lock/unlock operations don’t handle “wait for event” common thing we want to do with threads solution: other synchronization abstractions spinlocks waste CPU time more than needed want to run another thread instead of infjnite loop solution: lock implementation integrated with scheduler spinlocks can send a lot of messages on the shared bus more effjcient atomic operations to implement locks 51

  76. ping-ponging value some CPU (this example: CPU2) acquires lock “I want to modify lock ” CPU1 sets lock to unlocked “I want to modify lock ” (to see it is still locked) CPU3 read-modify-writes lock “I want to modify lock ” (to see it is still locked) CPU2 read-modify-writes lock “I want to modify lock ?” Invalid --- lock state address CPU1 lock CPU2 CPU3 MEM1 address value state locked Invalid Modifjed address value state lock --- 52

  77. ping-ponging value some CPU (this example: CPU2) acquires lock “I want to modify lock ” CPU1 sets lock to unlocked “I want to modify lock ” (to see it is still locked) CPU3 read-modify-writes lock “I want to modify lock ” (to see it is still locked) CPU2 read-modify-writes lock “I want to modify lock ?” Invalid --- lock state address CPU1 lock CPU2 CPU3 MEM1 address value state --- Modifjed Invalid address value state lock locked 52

  78. ping-ponging value some CPU (this example: CPU2) acquires lock “I want to modify lock ” CPU1 sets lock to unlocked “I want to modify lock ” (to see it is still locked) CPU3 read-modify-writes lock “I want to modify lock ” (to see it is still locked) CPU2 read-modify-writes lock “I want to modify lock ?” Modifjed locked lock state address CPU1 lock CPU2 CPU3 MEM1 address value state --- Invalid Invalid address value state lock --- 52

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend