COMP 3713 Operating Systems Slides Part 2 Jim Diamond CAR 409 - - PowerPoint PPT Presentation

comp 3713 operating systems slides part 2
SMART_READER_LITE
LIVE PREVIEW

COMP 3713 Operating Systems Slides Part 2 Jim Diamond CAR 409 - - PowerPoint PPT Presentation

COMP 3713 Operating Systems Slides Part 2 Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University Acknowledgements These slides borrow from those prepared for Operating System Concepts (eighth edition) by


slide-1
SLIDE 1

COMP 3713 — Operating Systems Slides Part 2

Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University

slide-2
SLIDE 2

Acknowledgements These slides borrow from those prepared for “Operating System Concepts” (eighth edition) by Silberschatz, Galvin and Gagne. These slides borrow lightly from those prepared for COMP 3713 by Dr. Darcy Benoit.

slide-3
SLIDE 3

Chapter 4

Threads

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-4
SLIDE 4

Chapter 4 86

What is a thread?

  • A thread is a “unit” of CPU utilization

– – it shares code, non-stack data, and other resources (such as open files)

  • Multiple threads can be associated with a single process
  • A thread is also referred to as a lightweight process
  • The resources saved by the threads being “lightweight” can be used for
  • ther processing
  • Unlike traditional (“heavyweight”) processes, it is possible for threads to

do more work with less resources (on average)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-5
SLIDE 5

Chapter 4 87

Single and Multithreaded Processes

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-6
SLIDE 6

Chapter 4 88

Benefits of Using Multiple Threads

  • A single-threaded process has only one sequence of instructions being

executed – if a single-threaded process has to deal with (say) multiple input sources, it must have some facility for watching all of them concurrently

(*cough*, select(), poll())

– a complex single-threaded process may be hard to write, debug and maintain

(or so say the people who don’t understand select())

  • Using a multi-threaded process can have these benefits:

– improved responsiveness (e.g., a process’ UI can still respond “instantly” even if another thread is crunching or blocked)

*cough*

– resource sharing: only one copy of a process’ code/data is required in memory; no need to use (explicit) shared memory functions – economy: cheaper to create and use threads than processes (30X for creation, 5X for context switch in Solaris) –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-7
SLIDE 7

Chapter 4 89

Multicore Programming

  • Multicore systems put pressure on programmers to use them efficiently
  • Challenges include

– – balance – it is undesirable to have one thread or process that does 99% of the work – – the data used by separate processes/threads should be separable to different cores (GEQ: Why?) – data dependency – when data is shared by two or more threads, processes/threads must be properly synchronized – testing and debugging – testing and debugging concurrent processes/threads is (much?) more difficult than single-threaded processes (GEQ: Why?)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-8
SLIDE 8

Chapter 4 90

Multithreaded Server Architecture

  • Better yet: have a pool of worker threads waiting for something to do

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-9
SLIDE 9

Chapter 4 91

Execution: Single Core vs. Multicore Single core: Multicore:

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-10
SLIDE 10

Chapter 4 92

Threads: User and Kernel

  • User threads:

– – threads are managed without kernel support – thread management done by user-level threads library –

  • Kernel threads

– – implemented and used by most (all?) current OSes

  • The three primary thread libraries are POSIX Pthreads, Win32 threads,

Java threads

  • All threads in one process must get CPU time; how?

1: the thread library does the scheduling and dispatching of a given process’ threads; or 2: kernel knows about, schedules and dispatches the threads

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-11
SLIDE 11

Chapter 4 93

Many-to-One Threading Model

  • Many user threads mapped to one kernel thread

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-12
SLIDE 12

Chapter 4 94

One-to-One Threading Model

  • Benefits of 1–1:

– – can make use of multiple processors/cores

  • Drawback of 1–1: need to create a kernel thread for every user-space

thread, so more overhead involved

  • many-1: Solaris “green threads”
  • 1–1: Linux, ms-windows, Solaris > 8,. . .
  • Q: Is there something (possibly) better overall?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-13
SLIDE 13

Chapter 4 95

Many-to-Many Threading Model

  • Allows many user level threads to be mapped to many kernel threads

  • Allows the operating system to create a “sufficient number” of kernel

threads

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-14
SLIDE 14

Chapter 4 96

Two-level Model

  • Similar to many-to-many, except that it allows one or more user threads

to be bound to their own kernel thread

  • Examples: IRIX, HP-UX, Tru64 UNIX (formerly Digital UNIX), Solaris 8

and earlier

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-15
SLIDE 15

Chapter 4 97

Thread Libraries

  • Thread libraries provide programmers with APIs for creating and

managing threads

  • There are two primary ways of implementation

– – kernel-level library supported by the OS

  • “Pthreads” is one such library

– may be provided either as user-level or kernel-level – a POSIX standard (IEEE 1003.1c) API for thread creation and synchronization – – Common in UNIX operating systems (Solaris, Linux, MacOS)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-16
SLIDE 16

Chapter 4 98

Java Threads

  • Java threads are managed by the JVM
  • Typically implemented using the threads model provided by the

underlying OS – – implementing the Runnable interface — see textbook – extending the Thread class and using its start() method:

class Worker extends Thread { public void run() { System.out.println("I am a worker thread, woe is me"); /* Do something useful here? */ } } public class First { public static void main(String args[]) { Worker w = new Worker(); w.start(); System.out.println("I am main()"); } }

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-17
SLIDE 17

Chapter 4 99

Issues with Threads

  • Does the use of threads affect the semantics of system calls?

– e.g., does fork() create a new process with all the threads, or only

  • ne thread?

– e.g., does exec() replace all the threads or just one thread?

  • Thread cancellation of target thread: asynchronous or deferred?
  • Signal handling
  • Thread pools

  • Thread-specific data

– – support is needed for threads to have private data

(stack is private)

  • Scheduler activations

– are threads scheduled individually or as a group? – e.g., do two threads get two time slices?

  • Should all threads in a given process have the same priority?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-18
SLIDE 18

Chapter 4 100

Thread Cancellation

  • Terminating a thread before it has finished
  • Two general approaches:

– asynchronous cancellation terminates the target thread immediately – deferred cancellation allows the target thread to periodically check if it should be cancelled

  • Issue: what if a thread is asynchronously cancelled while updating data
  • ther threads are using?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-19
SLIDE 19

Chapter 4 101

Signal Handling (Unix)

  • A signal is a low-level way to notify a process that some event has
  • ccurred

– – programs are able to ignore (most) signals

  • Signals are handled by a signal handler

– it can be the default handler or a user-defined handler

  • What about multithreaded processes?

– deliver the signal to the appropriate thread?

Which one is appropriate?

– deliver the signal to all threads? – – assign one thread the job of handling signals?

  • See man 2 signal, man 7 signal, man 2 kill, man 2 sigaction, and

their many, many friends –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-20
SLIDE 20

Chapter 5

Process Synchronization

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-21
SLIDE 21

Chapter 5 102

Background

  • Concurrent access to shared data may result in data inconsistency

– – this could be data shared by multiple processed using shared memory

  • Maintaining data consistency requires mechanisms to ensure the orderly

execution of cooperating processes

  • Example: suppose that we wanted to provide a solution to the

consumer-producer problem that fills all the buffers – we can do so by keeping an integer count that tracks the number of full buffers – initially, count is set to 0 –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-22
SLIDE 22

Chapter 5 103

Producer Process

  • Producer and consumer are sharing memory (buffer[BUFFER_SIZE]

and count)

in = 0; // index next item will be placed in while (true) { /* Produce an item and put it in nextProduced */ while (count == BUFFER_SIZE) ; /* do nothing but wait for an empty slot */ buffer[in] = nextProduced; in = (in + 1) % BUFFER_SIZE; count++; }

  • Note: a busy wait like this should be avoided whenever possible

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-23
SLIDE 23

Chapter 5 104

Consumer Process

  • ut = 0;

while (true) { while (count == 0) ; /* do nothing but wait for a filled slot */ nextConsumed = buffer[out];

  • ut = (out + 1) % BUFFER_SIZE;

count--; /* "Consume" the item in nextConsumed */ }

  • The busy wait is equally ugly here
  • Q: does it matter where the “count--;” statement is, as long as it

follows the inner loop?

  • Q: does it matter how the C compiler implements “count--;”?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-24
SLIDE 24

Chapter 5 105

Possible Race Condition in That Code

  • count++ could be implemented as

R1 = count ; R1 is register 1 R1 = R1 + 1 count = R1

  • count-- could be implemented as

R2 = count ; R2 is register 2 R2 = R2 - 1 count = R2

  • Consider this execution interleaving, given that initially count = 5:

– S0: producer executes

R1 = count (R1 = 5)

– S1: producer executes

R1 = R1 + 1 (R1 = 6)

– S2: consumer executes

R2 = count (R2 = 5)

– S3: consumer executes

R2 = R2 - 1 (R2 = 4)

– S4: producer executes

count = R1 (count = 6)

– S5: consumer executes

count = R2 (count = 4)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-25
SLIDE 25

Chapter 5 106

Critical Sections

  • There is a so-called critical section in the previous code:

– if count++ and/or count-- are not performed atomically, Bad Things can happen

  • In this example, count might end up with the wrong value

– – the producer could think the buffer is full when there is actually an empty slot

  • In other cases, data could be entirely lost or corrupted
  • When multiple processes are cooperating through the use of shared

resources, great care must be taken to avoid such problems

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-26
SLIDE 26

Chapter 5 107

Solution to Critical-Section Problem

Three criteria must be satisfied: (1) Mutual Exclusion: If some process P is executing a critical section, then no other processes can be executing a critical section (2) Progress: If no process is executing a critical section and some processes wish to enter a critical section, then the selection of the processes that will enter the critical section next cannot be postponed indefinitely (3) Bounded Waiting: A bound must exist on the number of times that other processes are allowed to enter their critical sections after a process has made a request to enter its critical section and before that request is granted – – no assumption is made concerning the relative speeds of the N processes

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-27
SLIDE 27

Chapter 5 108

Peterson’s Solution: A Software Critical-Section Solution

  • Suppose there are two processes, P0 and P1
  • This solution assumes (*cough*) that the LOAD and STORE instructions

are atomic; that is, they cannot be interrupted

and no cache issues

  • The two processes share two variables:

  • The variable turn indicates whose turn it is to enter the critical section
  • The flag array is used to indicate if a process is ready to enter the

critical section; flag[i] = true implies that process Pi is ready

/* The code for process i */

  • ther_p = 1 - i;

do { flag[i] = TRUE; // I want in! turn = other_p; // Give other process priority while (flag[other_p] && turn == other_p) ; <critical section> flag[i] = FALSE; // I’m out! <remainder section> } while (TRUE);

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-28
SLIDE 28

Chapter 5 109

Solution to Critical-Section Problem Using Locks

  • A lock is an abstract concept which at most one process can have/own

at a time

  • This general structure can be used by any process which uses a shared

resource –

execute any initialization code while (TRUE) { acquire lock execute critical-section code release lock execute any post-critical-section code };

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-29
SLIDE 29

Chapter 5 110

Synchronization Hardware

  • Many systems provide hardware support for critical section code
  • On uniprocessors you could disable interrupts

– – generally this is too inefficient on multiprocessor systems –

  • perating systems using this are not broadly scalable
  • Modern machines provide special atomic hardware instructions which

can be used to provide protection for critical sections – – two “common” choices: – test memory word and set value (“test and set”) –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-30
SLIDE 30

Chapter 5 111

❚❡st❆♥❞❙❡t Instruction

  • One conceptualization:

Boolean TestAndSet(Boolean * target) { Boolean rv = *target; *target = TRUE; return rv: }

  • This code is not atomic, but imagine a single machine instruction that

does this atomically

  • The use of the word “test” is somewhat misleading, since there is no

test within the instruction itself

  • Some architectures define this instruction to work on a single bit (rather

than a byte or word) for the case where multiple independent flag bits are used in the same byte or word

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-31
SLIDE 31

Chapter 5 112

Critical Section Solution using ❚❡st❆♥❞❙❡t

  • Use a shared Boolean variable lock, initialized to FALSE
  • Solution:

<initialization code> while (TRUE) { while (TestAndSet(&lock)) ; // do nothing <critical-section code here> lock = FALSE; <post-critical-section code> };

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-32
SLIDE 32

Chapter 5 113

❙✇❛♣ Instruction

  • One conceptualization:

void Swap(Boolean * a, Boolean * b) { Boolean temp = *a; *a = *b; *b = temp: }

  • Once again, this code is not atomic, but imagine a single machine

instruction that does this atomically

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-33
SLIDE 33

Chapter 5 114

Critical Section Solution using ❙✇❛♣

  • There is a shared Boolean variable lock initialized to FALSE; each

process has a local Boolean variable key

  • Solution:

<initialization code> while (TRUE) { Boolean key = TRUE; while (key == TRUE) // Ugly busy wait Swap(&lock, &key); <critical-section code here> lock = FALSE; <post-critical-section code> };

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-34
SLIDE 34

Chapter 5 115

Bounded-Waiting Mutual Exclusion with ❚❡st❆♥❞❙❡t

  • Consider the case of n processes wanting access to the critical section

/* Code for process i */ while (TRUE) { waiting[i] = TRUE; key = TRUE; while (waiting[i] && key) // Ugly busy wait key = TestAndSet(&lock); waiting[i] = FALSE; <critical-section code here> j = (i + 1) % n; while (j != i && !waiting[j]) j = (j + 1) % n; if (j == i) lock = FALSE; else waiting[j] = FALSE; <post-critical-section code> };

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-35
SLIDE 35

Chapter 5 116

Semaphores

  • Semaphores are a synchronization tool that do not require busy waiting
  • A semaphore uses (“is”?) an integer variable S
  • Two standard operations modify S: wait() and signal()

  • Usage is (a bit) less complicated than previous solutions
  • A semaphore can only be accessed via two indivisible (atomic)
  • perations:

wait(S) { while S <= 0 ; // no-op S--; } signal(S) { S++; }

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-36
SLIDE 36

Chapter 5 117

Semaphores as General Synchronization Tools

  • Counting semaphore: an integer value which can range over an

unrestricted domain

  • Binary semaphore: an integer value which can range only between 0

and 1; can be simpler to implement –

  • Binary semaphores are also known as mutex locks

“mutual exclusion”

Semaphore mutex = 1; // Initialization code while (true) { wait(mutex); // Critical section code here signal(mutex); // Post-critical-section code };

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-37
SLIDE 37

Chapter 5 118

Semaphore Implementation: 1

  • Problem: the given definition of wait() may require busy waiting

– – this is A Very Bad Thing

  • Solution: modify wait() to avoid spinning as follows:

– when wait() finds that the semaphore is ≤ 0, rather than spinning it should block itself – it is awakened when the semaphore becomes positive

  • To implement this, each semaphore has an associated queue

– – now signal() must examine the queue and wake up the first process (if there is one waiting)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-38
SLIDE 38

Chapter 5 119

Semaphore Implementation: 2

  • Implement a semaphore with the following struct:

typedef struct { int value; struct process * list; } semaphore;

  • When a struct is initialized, both the value and the list must be

appropriately initialized

  • Must have a block(void) system call which blocks the calling process
  • Also need a wakeup(process_reference) system call which wakes up

the given process

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-39
SLIDE 39

Chapter 5 120

Semaphore Implementation: 3

  • Implementation of wait():

wait(semaphore * S) { S->value--; if (S->value < 0) { add this process P to S->list; block(); } }

  • Implementation of signal():

signal(semaphore * S) { S->value++; if (S->value <= 0) { remove a process P from S->list; wakeup(P); } }

  • But wait! We still have a problem. . .

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-40
SLIDE 40

Chapter 5 121

Semaphore Implementation: 4

  • We can’t allow more than one process to be executing signal() or

wait() on a given semaphore at a given time

– – we could turn off interrupts on a uniprocessor system – problematic on a multiprocessor system

  • So use a spinlock (!)
  • The use of a spinlock is acceptable in this case

– – no process will be spinning for long waiting to enter the critical section – so the busy wait corresponds to the critical section “entry time”, not the critical section “execution time”

  • Q: Any problems here?
  • GEQ: where is the spinlock released in wait() ?
  • Q: How do you ensure atomicity on an SMP system?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-41
SLIDE 41

Chapter 5 122

Synchronization Issues (Even with Semaphores)

  • It might seem that having semaphores solves all of our problems
  • Sadly, it isn’t so
  • The next few slides outline some of the problems which remain

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-42
SLIDE 42

Chapter 5 123

Deadlock and Starvation

  • Deadlock: two or more processes are waiting indefinitely for an event

that can be caused by only one of the waiting processes

  • E.g.: let S and Q be two semaphores initialized to 1.

Let P0 and P1 be two processes that execute the following code:

P0 P1 wait(S); wait(Q); wait(Q); wait(S); . . . . signal(S); signal(Q); signal(Q); signal(S);

– if P0 executes wait(S) and P1 executes wait(Q) before P0 executes wait(Q) there is a problem: neither will ever be awakened

  • Starvation (indefinite blocking): a process might never be removed from

the semaphore queue in which it is suspended –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-43
SLIDE 43

Chapter 5 124

The Priority Inversion Problem

  • We may assign time-critical processes (or other “important” processes)

high priorities –

  • Suppose a low-priority process enters a critical section but a

high-priority process then wants the same resource – the high-priority process must then wait for the low-priority process to release the resource – this is known as priority inversion

  • It can be worse: the low-priority process can be preempted by some

medium-priority process, keeping the high-priority process waiting even longer –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-44
SLIDE 44

Chapter 5 125

Classic Problem 1: Bounded Buffer

  • Overview:

– – there is a producer filling buffers – there is a consumer consuming buffers

  • We can implement this using the following semaphores:

– – semaphore full initialized to the value 0, and (think of this semaphore representing the idea “there is at least one full buffer location”) – semaphore empty initialized to the value N (think of this semaphore representing the idea “there is at least one empty buffer location”)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-45
SLIDE 45

Chapter 5 126

Bounded Buffer: Producer and Consumer Processes

  • Producer process:

while (true) { // produce an item wait(empty); wait(mutex); // add the item to the buffer signal(mutex); signal(full); };

  • Consumer process:

while (true) { wait(full); wait(mutex); // remove an item from buffer signal(mutex); signal(empty); // consume the just-removed item };

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-46
SLIDE 46

Chapter 5 127

Classic Problem 2: Readers-Writers Problem

  • Overview:

– – readers: only read the data set; they do not perform any updates – writers: can both read and write

  • Problem: allow multiple readers to read at the same time,

but only one writer can access the shared data at a given time

  • Problem V1: readers can begin reading whenever other readers are

already reading

  • Problem V2: readers can NOT begin reading if a writer is waiting
  • We can implement this with the following shared data:

– – semaphore mutex initialized to 1 – semaphore wrt initialized to 1 –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-47
SLIDE 47

Chapter 5 128

Readers-Writers Problem: Processes (Solution to Problem V1)

Writer Process Reader Process

while (true) while (true) { { // generate data to write wait(mutex); wait(wrt); readcount++; // write that data if (readcount == 1) signal(wrt); wait(wrt); }; signal(mutex); // reading is performed wait(mutex); readcount--; if (readcount == 0) signal(wrt); signal(mutex); };

  • Note how mutex is used to ensure that the (otherwise non-atomic)

examination and modification is properly protected

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-48
SLIDE 48

Chapter 5 129

Classic Problem 3: The Dining Philosophers Problem

  • The data is the bowl of rice
  • Each philosopher needs two chopsticks to eat
  • Implement with a semaphore array chopstick[5] initialized to all 1’s
  • More realistic example: a process needs two optical drives at the same

time to copy a CD or DVD (& system has 5 optical drives)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-49
SLIDE 49

Chapter 5 130

The Dining Philosophers Problem: Candidate Solution

  • Philosopher i’s algorithm:

while (true) { wait(chopstick[i]); wait(chopStick[(i + 1) % 5]); // eat signal(chopstick[i]); signal(chopstick[(i + 1) % 5]); // think };

  • But if every philosopher picks up (say) his right-hand chopstick at the

same time, we get deadlock: all will starve to death

  • Need a solution to this!

– – S2: odd philosophers grab left chopstick first, even right? – S3: only allow 4 philosophers to sit at the same time?

  • A given philosopher still might starve with any of these

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-50
SLIDE 50

Chapter 5 131

(Alleged) Problems with Semaphores

  • Recall there are two semaphore operations:

wait() and signal()

  • Q: is it signal() . . . wait() or wait() . . . signal() ?

– – this may have been more of a problem when they were called

P() and V()

  • What happens if you use wait() . . . wait() ?
  • What happens if you forget wait() ?
  • What happens if you forget signal() ?
  • Yes, these are problems, but if you don’t understand the API for any

package/system/class/library you use, you will be in trouble

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-51
SLIDE 51

Chapter 5 132

Monitors

  • A high-level abstraction that provides a (*cough*) convenient and effective

mechanism for process synchronization

  • Idea: a monitor is similar to a class, but only one process may be active

“within” the monitor at a time

monitor monitor-name { // shared variable declarations procedure P1(...) { ... } . ... . procedure Pn(...) { ... } Initialization code (...) { ... } }

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-52
SLIDE 52

Chapter 5 133

Revolutionary Egg-Shaped View of a Monitor

  • Only one process can be in a monitor at a given time

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-53
SLIDE 53

Chapter 5 134

Monitors: Condition Variables

  • By itself, monitors do not suffice for all purposes

  • condition x, y, z;
  • There are two operations on a condition variable:

x.wait() – a process that invokes this operation is suspended

(until awakened with x.signal()) –

x.signal() – if any processes have invoked x.wait() this resumes

  • ne of them
  • If you aren’t careful you could confuse condition vars with semaphores

  • GEQ: what’s the difference between a condition variable and a

semaphore?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-54
SLIDE 54

Chapter 5 135

Revolutionary Egg-Shaped View of a Monitor with Condition Variables

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-55
SLIDE 55

Chapter 5 136

Monitor Solution to Dining Philosophers Problem: 1

monitor DP { enum {THINKING, HUNGRY, EATING} state[5]; condition self[5]; void pickup(int i) { state[i] = HUNGRY; test(i); if (state[i] != EATING) self[i].wait; } void putdown(int i) { state[i] = THINKING; // test left and right neighbours test((i + 4) % 5); test((i + 1) % 5); } void test(int i) { if (state[(i + 4) % 5] != EATING && state[i] == HUNGRY && state[(i + 1) % 5] != EATING) { state[i] = EATING; self[i].signal(); } } initialization_code() { for (int i = 0; i < 5; i++) state[i] = THINKING; } }

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-56
SLIDE 56

Chapter 5 137

Solution to Dining Philosophers Problem: 2

  • Each philosopher i invokes the operations pickup() and putdown() in

the following sequence:

DiningPhilosophters.pickup(i); EAT DiningPhilosophers.putdown(i);

  • This solution is deadlock-free
  • BUT: a philosopher can still starve
  • Q: how can you prevent starvation?
  • Another issue: when process P1 does a signal() on a condition

variable, it is inside the monitor – but waking another process P2 will allow P2 to be executing inside the monitor (which would be two processes executing inside the monitor, which is a no-no) – must deal with this ugly implementation issue: see text

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-57
SLIDE 57

Chapter 5 138

(Very Short) Case Study: Synchronization in Linux

  • Prior to kernel Version 2.6, Linux was non-preemptive

– i.e., a process running in kernel mode could not be preempted, even by a higher-priority process – this was not good for “real-time” capability –

  • Version 2.6 and later, fully preemptive
  • Linux provides semaphores and spin locks

– – the system is designed so these are not held for long – for “longer-term” synchronization semaphores are used

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-58
SLIDE 58

Chapter 5 139

Pthreads Synchronization

  • Available to “user level” processes

  • API:

#include <pthread.h> pthread_mutex_t mutex; /* create the mutex lock */ pthread_mutex_init(&mutex,NULL); // NULL == init to def attrs /* acquire the mutex lock */ pthread_mutex_lock(&mutex); /* CRITICAL SECTION */ /* release the mutex lock */ pthread_mutex_unlock(&mutex);

  • Some systems also provide semaphores (POSIX “SEM” extension)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-59
SLIDE 59

Chapter 6

CPU Scheduling

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-60
SLIDE 60

Chapter 6 140

CPU Scheduling

  • Basic idea: the CPU is a valuable resource which we want to fully utilize

– multiprogramming: have multiple processes running concurrently so that when one is waiting (usually for I/O) the CPU can be doing

  • ther work
  • However, in a time-sharing system, it is desirable to give quick response

to processes which are interacting with a human – thus we need to be able to take the CPU away from a compute-bound process before it has to wait for I/O

  • Poor scheduling of the CPU can

– – decrease system utilization

  • Thus CPU scheduling is a critical design aspect of an operating system

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-61
SLIDE 61

Chapter 6 141

CPU and I/O “Bursts”

  • Processes tend to alternate sequences of

CPU processing and I/O – – do some I/O – repeat those two steps N times –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-62
SLIDE 62

Chapter 6 142

Histogram of CPU Burst Times

  • This varies wildly with program, computer, . . .
  • BUT the distribution is important when designing a scheduling algorithm

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-63
SLIDE 63

Chapter 6 143

CPU (a.k.a. “Short-Term”) Scheduler

  • The CPU scheduler selects from among the processes in memory that

are ready to execute, and allocates the CPU to one of them

  • CPU scheduling decisions may take place when a process:

– – e.g., an interrupt occurs – switches from waiting to ready – – switches from running to waiting state – e.g., I/O request or wait() –

  • Scheduling for the last two reasons is said to be nonpreemptive
  • All other scheduling is preemptive

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-64
SLIDE 64

Chapter 6 144

Preemptive vs. Nonpreemptive Scheduling

  • Preemptive scheduling

– – ensures that no process can hog all of the CPU time†

  • Some “OS”es (DOS/Windoze 3.1, Mac OS versions < X) used

nonpreemptive scheduling –

  • One advantage of nonpreemptive scheduling:
  • n single-CPU machines!

– a process could make sure that it has made all required changes to shared data before giving up the CPU – if such a process was preempted, it might have only made part

  • f the changes
  • In the case of preemptive systems, processes using shared data need to

guard against this possibility

† notwithstanding scheduling priorities

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-65
SLIDE 65

Chapter 6 145

Dispatcher

  • The dispatcher module gives control of the CPU to the process selected

by the short-term scheduler; this involves: – – switch to the context of the chosen process (this includes register values, stack values, etc.) – switching to user mode – switch from monitor mode (which the dispatcher runs in) to user mode (for the process to run in) – jumping to the proper location in the user program to restart that program

  • The dispatch latency is the time it takes for the dispatcher to stop one

process and start another running –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-66
SLIDE 66

Chapter 6 146

Scheduling Criteria

1. CPU utilization: what percentage of the time the CPU is busy 2. Throughput: the number of jobs completed per unit time – 3. Turnaround time: the amount of time to execute a particular job – 4. Waiting time: the amount of time a process has been waiting in the ready queue – 5. Response time: the amount of time it takes from when a request was submitted until the first response (not all of the output!) is produced – for time-sharing environments, where the impatient human wants confirmation that something is happening – pacifiers: animated icons, splash screens, . . .

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-67
SLIDE 67

Chapter 6 147

Scheduling Algorithm Optimization Criteria

  • Maximize CPU utilization
  • Maximize throughput
  • Minimize turnaround time
  • Minimize waiting time
  • Minimize response time
  • These criteria work against each other

  • Q: which do you think is the most important criteria for the OS you like

to use?

  • Q: for an interactive situation, should you minimize the average

response time or the variance of the response time?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-68
SLIDE 68

Chapter 6 148

First Come, First Served (FCFS) Scheduling

  • Idea: the first job that is ready gets the CPU until it is done

– – arguably fair, from one point of view

  • Suppose you have the following jobs to process:

P1 — 24 units of time P2 — 3 units of time P3 — 3 units of time P1 P2 P3 24 27 30

  • Waiting times are 0, 24 and 27
  • Average wait time is 0 + 24 + 27

3 = 17

  • But if schedule is P2, P3, P1 then average wait time is 0 + 3 + 6

3 = 3

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-69
SLIDE 69

Chapter 6 149

Shortest-Job-First (SJF) Scheduling

  • Associate with each process the length of its next CPU burst

– – ties can be broken with some rule (such as FCFS)

  • SJF is optimal: gives minimum average waiting time for a given set of

processes – it avoids the so-called “convoy effect” of the “smaller” processes lining up behind the “larger” processes

  • The difficulty is knowing the length of the next CPU burst

(details, details)

  • Q: But what might happen to a job whose next CPU burst is large?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-70
SLIDE 70

Chapter 6 150

SJF Example

  • Suppose P1, P2, P3 and P4 have become ready in that order, and their

burst times are as follows: P1 — 6 P2 — 8 P3 — 7 P4 — 3

  • SJF will schedule them as follows:

P4 P1 P3 P2 3 9 16 24

  • Average SJF waiting time is 3 + 16 + 9 + 0

4 = 7

  • Average FCFS waiting time is 0 + 6 + 14 + 21

4 = 10.25

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-71
SLIDE 71

Chapter 6 151

SJF Issues

  • SJF (provably) gives the optimal solution for average wait time
  • How do we get the length of the next CPU burst?
  • Do we (re-)do this scheduling from scratch every time we need to select

a new job? –

  • Starvation occurs when a process needs some resource (such as CPU,

disk, printer, etc.) and it is never allocated the resource – although it may increase the average waiting time, longer processes must be allowed to access needed resources at some point in time

  • Q: if a job is being processed and another job with a lower burst time

becomes ready, should the first job be preempted? – doing so will result in lower (thus better) average response times

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-72
SLIDE 72

Chapter 6 152

Finding the Length of CPU Bursts

  • In general, this is impossible to predict with 100% certainty
  • But we can estimate the next length from the previous lengths
  • One technique: exponential averaging

  • Let

tn = actual length of nth CPU burst τn+1 = predicted length of next CPU burst α be some constant where 0 ≤ α ≤ 1

  • Then estimate τn+1 with

τn+1 = αtn + (1 − α)τn

  • As α gets larger, τn+1 more closely resembles tn
  • As α gets smaller, the “history” of {tn} has greater effect

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-73
SLIDE 73

Chapter 6 153

Exponential Averaging of Burst Lengths

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-74
SLIDE 74

Chapter 6 154

Priority Scheduling

  • Idea: a priority number (integer) is associated with each process

– – range of numbers can be 1–7, 0–4095, . . .

  • Policy: the CPU is allocated to the process with the highest priority
  • Can use this in two ways:

– – nonpreemptive

  • SJF is a priority scheduling where the priority is the prediction of the

next CPU burst time –

  • Problem: starvation =

⇒ low priority processes may never execute

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-75
SLIDE 75

Chapter 6 155

Round Robin (RR)

  • Idea: each process gets a small unit of CPU time (time quantum),

usually 10–100 milliseconds

  • After this time has elapsed, the process is preempted and added to the

end of the ready queue

  • If there are n processes in the ready queue and the time quantum is q,

then each process gets 1/n of the CPU time in chunks of at most q time units at once

  • No process waits more than (n − 1)q time units for its (next) turn at the

CPU

  • Performance:

– – as q decreases towards the time required to perform a context switch, the percentage of time spent on overhead becomes large

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-76
SLIDE 76

Chapter 6 156

Example of RR with Time Quantum 4

  • Suppose:

P1 has a burst time of 24 P2 has a burst time of 3 P3 has a burst time of 3

  • These would be processed as follows:

P1 P2 P3 P1 P1 P1 P1 P1 4 7 10 14 18 22 26 30

  • Note that both P2 and P3 finished before their entire time slot

(quantum) was used up

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-77
SLIDE 77

Chapter 6 157

RR Issues

  • The previous slides ignore the overhead of scheduling and dispatching a

new process – – suppose time slice is either 1ms, 10ms or 100ms, and the schedule + dispatch time is 100µs –

  • verhead for 100ms quantum: 0.1%

– –

  • verhead for 1ms quantum: 10%
  • Compared to SJF, RR typically has

– – a better response time

  • Q: how big should the RR time quantum be to optimize

– response time? – turnaround time? – average wait time?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-78
SLIDE 78

Chapter 6 158

RR Turnaround Time: NOT Monotone in Time Quantum

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-79
SLIDE 79

Chapter 6 159

Multilevel Queue Scheduling

  • The ready queue is partitioned into separate queues; e.g.,

⇒ foreground (interactive) ⇒ background (batch)

  • Each queue may have its own scheduling algorithm

– – background — FCFS

  • The queues themselves musts be scheduled

– fixed priority scheduling: (e.g., serve all from foreground then from background) – there is the possibility of starvation for background tasks – time slice: each queue gets a certain amount of CPU time which it can schedule amongst its processes – e.g., 80% to foreground in RR, 20% to background in FCFS

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-80
SLIDE 80

Chapter 6 160

Multilevel Queue Scheduling: 5 Queue Example

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-81
SLIDE 81

Chapter 6 161

Multilevel Feedback Queue

  • Idea: allow a process to move between the various queues; aging can be

implemented this way

  • A multilevel-feedback-queue scheduler can be defined by the following

parameters – – scheduling algorithms for each queue – method used to determine when to upgrade a process – method used to determine when to demote a process – method used to determine which queue a process will enter when that process needs service

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-82
SLIDE 82

Chapter 6 162

Multilevel Feedback Queue Example

  • Jobs entering the system get put

in Q0 where there is an 8ms quantum

  • Jobs exceeding 8ms get moved

to Q1 where there is an 16ms quantum

  • Jobs exceeding 24ms get moved

to Q2 where they are scheduled with FCFS

  • Result: jobs with small CPU bursts are dealt with quickly
  • Jobs with somewhat larger CPU bursts are dealt with fairly quickly
  • Jobs with large CPU bursts have to wait their turn
  • Is response time of such a system going to be good?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-83
SLIDE 83

Chapter 6 163

Thread Scheduling: 1

  • There is a distinction between user-level and kernel-level threads

– – the (user space) thread library handles the thread scheduling itself

  • In situations where the system implements kernel-level threads, the

kernel actually schedules threads, not processes

  • Kernel threads: recall there are many-to-one, many-to-many models,

and one-to-one models –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-84
SLIDE 84

Chapter 6 164

Thread Scheduling: 2

  • In many-to-one and many-to-many scenarios, the thread library

schedules user-level threads to run on a lightweight process (LWP) – this is known as process-contention scope (PCS), since scheduling competition is within the process – PCS scheduling is typically done by thread priority

  • A kernel thread is scheduled onto an available CPU in the

system-contention scope (SCS) –

  • Systems with one-to-one threading (Linux, . . . ) use SCS only
  • Note: the pthread API allows specifying either PCS or SCS, but not all

systems implement both choices

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-85
SLIDE 85

Chapter 6 165

Multiple-Processor Scheduling

  • CPU scheduling is more complex when multiple CPUs are available
  • Issues:

– – are all the CPUs identical?

  • If CPUs are identical, it often make sense for them to share a queue

(or queue structure) – – this prevents the situation in which one CPU is idle while there are tasks waiting in the other’s ready queue

  • There are two forms of scheduling used in multi-processor systems:

– – symmetric multiprocessing

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-86
SLIDE 86

Chapter 6 166

Asymmetric Multiprocessing

  • One CPU is the “master”, the other(s) are the “slaves”
  • The master takes care of

– – I/O

  • The master “farms out” the CPU bursts to (the) other CPU(s)

  • This method is easy to implement

  • But. . . the asymmetry means that the master might have lots of work

to do while one or more slaves are idle – thus overall CPU utilization is not as good as it could be

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-87
SLIDE 87

Chapter 6 167

Symmetric Multiprocessing (SMP)

  • Each processor handles its own scheduling

  • Each processor could have its own queue

– advantage: if the cache memory is not shared amongst the processors, it is best (all other things considered) if a process runs

  • n the same processor next time as it did last time

– this is known as processor affinity – – hard affinity: insist that a process stays on the same processor – Linux has soft affinity by default, but the sched_{s,g}etaffinity calls support hard affinity

  • The scheduling code must be carefully written so that two CPUs don’t

access and modify the queue simultaneously

  • Issue: what if all the processes “want” CPU0, none “want” CPU1?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-88
SLIDE 88

Chapter 6 168

Architectural Concerns for Processor Affinity: NUMA Machines

  • In a NUMA computer, the CPU can access some memory more quickly

than other memory

GEQ: what does NUMA stand for?

  • It makes sense to run a process on the CPU which can access that

process’ memory most quickly

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-89
SLIDE 89

Chapter 6 169

Scheduling for Hyperthreaded CPU Cores

  • You almost certainly have a computer with multiple cores

  • However, the cores may be hyperthreaded

– – but the core can not run more than one (software) thread at a time

  • Instead, the hardware switches between threads at various times

– e.g., when one thread is waiting for the memory subsystem, another thread might run

UltraSPARC T3: 8 threads/core, 16 cores/chip

  • Depending on the hardware design, there may be a high cost to

switching threads –

  • Thus for a hyperthreaded CPU, there are two levels of scheduling; the

kernel schedules two processes/threads, and the hardware decides when to switch threads and which thread to run

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-90
SLIDE 90

Chapter 6 170

Real-Time CPU Scheduling

  • Recall that we divide “real-time” into two categories:

– soft real time: the deadlines for response are “soft”, which means it may be merely annoying or inconvenient if a deadline is missed – hard real time: something Very Bad may happen if a deadline is missed

  • Event latency: the amount of time from when an event happens until it

is serviced – – e.g., if you are timing user interactions, the time from when the user actually (say) presses a key until the keypress has been detected and recorded

  • In a real-time system it is important to minimize event latency

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-91
SLIDE 91

Chapter 6 171

Real-Time CPU Scheduling: 2

  • We can discuss two sources of latency:

– interrupt latency: the amount of time between when an interrupt arrives at the CPU until the interrupt service routine starts –

  • n most architectures, must complete current instruction; then

must determine which ISR to call – some architectures allow interrupts to be turned off (possibly for limited amounts of time); this can add significantly to the interrupt latency – dispatch latency: the amount of time to suspend one process and start another – minimizing dispatch latency requires a kernel which will preempt the running process in favour of a real-time process which is ready for the CPU

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-92
SLIDE 92

Chapter 6 172

Real-Time CPU Scheduling: 2a

  • Note that we also should consider the time taken by the ISR

– architectures can speed this up in various ways; e.g., one machine with 7 interrupt priority levels had 8 sets of registers, so that running an ISR did not require saving or restoring registers

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-93
SLIDE 93

Chapter 6 173

Real-Time CPU Scheduling: 2b

Conflicts: a process may be running in the kernel, and it may have resources required by a higher-priority process

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-94
SLIDE 94

Chapter 6 174

Real-Time CPU Scheduling: 3

  • There are various scheduling algorithms peculiar to real-time tasks
  • E.g., some real-time tasks may be periodic

e.g., sampling audio data?

– – it could inform the scheduler of its run time and of its deadline – the scheduler would be responsible for ensuring that during every period, the task gets its CPU requirement before the deadline –

  • Earliest deadline scheduling: the scheduler assigns priorities according to

deadlines – if tasks A and B must finish by TA and TB respectively, if TA < TB task A is given a higher priority – when a task becomes ready to run, the scheduler must be able to find out the task’s deadline –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-95
SLIDE 95

Chapter 6 175

Linux Process Scheduling (Real-time vs. Non-real-time)

  • The sched_setscheduler(pid_t pid, int policy, const struct

sched_param * param) function allows processes (with appropriate

permissions) to change scheduling policies (their own or other processes)

  • Non-real-time policies (param’s sched_priority = 0):

– –

SCHED_BATCH for “batch” style execution of processes

SCHED_IDLE for running very low priority background jobs

  • Real-time policies (1 ≤ sched_priority ≤ 99):

see

man sched_setscheduler

– –

SCHED_RR a round-robin policy

  • Scheduling is preemptive, based upon priority

  • On a single CPU machine, be very careful with real-time policies

DAMHIKT

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-96
SLIDE 96

Chapter 6 176

Linux Scheduling

  • Linux used to use the “traditional” Unix scheduler

  • Original scheduling algorithm was replaced with a O(1) time algorithm

in Linux 2.6

and then (in 2.6.23) by the (*cough*) Completely Fair Scheduler (CFS)

  • The scheduler uses two variables:

– – “nice” value

  • The nice value ranges from −20 (higher priority) to 19 (low priority)

  • Real-time priorities range from 1 to 99 (at most; call

sched_get_priority_min() and sched_get_priority_max() to get

the values on the current system) –

  • See this for a somewhat dated (2009) discussion of Linux schedulers

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-97
SLIDE 97

Chapter 6 177

Linux Scheduling (One Version, Anyway)

  • “Active” tasks have time left in their time slice

  • Real-time tasks are scheduled strictly by their (fixed) priority

– a time-sharing task’s priority is adjusted according to how long it was blocked for I/O

  • Real-time tasks get larger time slices
  • See kernel/sched_fair.c in the Linux 3.2 kernel source for an older

version of the algorithm –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-98
SLIDE 98

Chapter 6 178

Windows Ex-Pee Priorities

  • Note that real-time tasks always have higher priority than other tasks
  • The “base” (“normal”) priorities for a given type of task can be shifted

up and down, as conditions dictate

  • The active (foreground) process gets a boost in its scheduling priority

and a longer time slice to enhance response time and turnaround time

Jim Diamond, Jodrey School of Computer Science, Acadia University