COMP 3713 Operating Systems Slides Part 4 Jim Diamond CAR 409 - - PowerPoint PPT Presentation

comp 3713 operating systems slides part 4
SMART_READER_LITE
LIVE PREVIEW

COMP 3713 Operating Systems Slides Part 4 Jim Diamond CAR 409 - - PowerPoint PPT Presentation

COMP 3713 Operating Systems Slides Part 4 Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University Acknowledgements These slides borrow heavily from those prepared for Operating System Concepts (eighth edition) by


slide-1
SLIDE 1

COMP 3713 — Operating Systems Slides Part 4

Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University

slide-2
SLIDE 2

Acknowledgements These slides borrow heavily from those prepared for “Operating System Concepts” (eighth edition) by Silberschatz, Galvin and Gagne.

slide-3
SLIDE 3

Chapter 8

Main Memory

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-4
SLIDE 4

Chapter 8 202

Background

  • A program must be brought (from disk or other mass storage) into

memory and placed within a process for it to be run –

  • Main memory and registers are the only storage the CPU can access

directly

  • Register access can be done within one CPU clock cycle
  • Accessing main memory can take many CPU cycles
  • The cache sits between the main memory and the CPU registers

  • Protection of memory is required to ensure correct operation

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-5
SLIDE 5

Chapter 8 203

Memory Protection: Base and Limit Registers

  • A pair of registers (base

and limit) define the address space a process is permitted to use –

  • EVERY memory access

by a user-space program is checked against these limits – an attempted access outside this area generates an interrupt, which triggers the kernel to kill the offending process

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-6
SLIDE 6

Chapter 8 204

Binding of Instructions and Data to Memory Addresses

  • Address binding: the decision about where each instruction and data

item will be stored in memory

  • Address binding of instructions and data to memory addresses can

happen at three different stages – – must recompile code if starting location of process changes! – load time: must generate relocatable code if memory location is not known at compile time – execution time: binding delayed until run time if the process can be moved during its execution from one memory segment to another – need hardware support for address maps (e.g., base and limit registers)

  • The first two methods are archaic for general-purpose

multi-programmed computers

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-7
SLIDE 7

Chapter 8 205

Turning a Program Into A Process

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-8
SLIDE 8

Chapter 8 206

Logical vs. Physical Address Space

  • The concept of a logical address space that is bound to a separate

physical address space is central to proper memory management – logical address: generated by the CPU; also referred to as virtual address – physical address: address seen by the memory unit

  • Logical and physical addresses are the same in compile-time and

load-time address-binding schemes

  • Logical (virtual) and physical addresses differ in execution-time

address-binding schemes

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-9
SLIDE 9

Chapter 8 207

Memory-Management Unit (MMU)

  • The MMU is the hardware device that maps virtual to physical address
  • In the MMU scheme, the value in the relocation register is added to

every address generated by a user process at the time it is sent to memory

  • The user program deals with logical addresses; it never sees the real

physical addresses – – GEQ: what happens if you run two separate copies of such a program at the same time?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-10
SLIDE 10

Chapter 8 208

Dynamic Relocation Using a Relocation Register

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-11
SLIDE 11

Chapter 8 209

Dynamic Loading of Program Code

  • Idea: rather than loading ALL of the program code from the disk to

memory when a process starts, only load a function if/when it is needed at run time

  • Advantage: better memory-space utilization: unused routines are never

loaded –

  • Advantage: program can start more quickly since less code must be

loaded into memory (initially, at least)

  • No special support (maybe a little support?) from the operating system

is required – – that is, the program keeps track of what it has loaded into memory, and loads functions into memory before it calls them – it can unload functions when they are no longer needed

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-12
SLIDE 12

Chapter 8 210

Dynamic Linking

  • Idea: rather than copying code from libraries to the program executable

when a program is created, only link to a function if/when it is needed at run time –

  • A small piece of code, the stub, is used to locate the appropriate

memory-resident library routine –

  • The operating system is needed to check if the desired routine is in the

processes’ memory address space

  • Advantages:

– large libraries don’t have to be linked into every program, saving lots

  • f disk space

– all processes using a given shared library share the in-core copy of the code, saving main memory space –

  • See ldd command in Linux

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-13
SLIDE 13

Chapter 8 211

Swapping

  • A process can be swapped temporarily out of memory to a backing

store, and then brought back into memory for continued execution –

  • Backing store: usually a fast disk

– book says “large enough to accommodate copies of all memory images for all users; must provide direct access to these memory images”. . . not on (most?) modern OSes

  • Roll out, roll in: swapping variant used for priority-based scheduling

algorithms; lower-priority process is swapped out so higher-priority process can be loaded and executed

  • The major component of swap time is transfer time; total transfer time

is directly proportional to the amount of memory swapped

  • Modified versions of swapping are found on many systems (i.e., UNIX,

Linux, and ms-windows)

  • System maintains a queue of ready-to-run processes whose memory

images are on disk

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-14
SLIDE 14

Chapter 8 212

Schematic View of Swapping

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-15
SLIDE 15

Chapter 8 213

Contiguous Memory Allocation: 1

One Way to Accommodate OS & Processes

  • Main memory is usually divided into two partitions:

– resident operating system, usually held in low memory with the interrupt vector – user processes are held in high memory –

  • Relocation registers are used to protect user processes from each other,

and from changing operating system code and data – base register contains value of smallest physical address for the currently running process – limit register contains range of logical addresses: each logical address must be less than the limit register –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-16
SLIDE 16

Chapter 8 214

Hardware Support for Relocation and Limit Registers

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-17
SLIDE 17

Chapter 8 215

Contiguous Memory Allocation: 2

  • Multiple-partition allocation (can be fixed- or variably-sized)
  • Hole: a block of available memory;

holes of various size are scattered throughout memory

  • When a process arrives, it is allocated memory from a hole large enough

to accommodate it

  • Operating system maintains information about

(a) (b) free partitions (holes) OS process 5 process 8 process 2

OS process 5 process 2

OS process 5 process 9 process 2

OS process 5 process 9 process 10 process 2

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-18
SLIDE 18

Chapter 8 216

Dynamic Storage-Allocation Problem

  • How should we satisfy a request of size n from a list of free holes?
  • First-fit: use the first hole that is big enough
  • Best-fit: use the smallest hole that is big enough; must search entire

list, unless it is ordered by size –

  • Worst-fit: allocate the largest hole; may also need to search entire list,

if it is not sorted and you don’t have quick access to the largest block –

  • Textbook claims:

– – I believe this – First-fit and best-fit are better than worst-fit in terms of speed –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-19
SLIDE 19

Chapter 8 217

Fragmentation

  • External Fragmentation: total memory space exists to satisfy a request,

but it is not contiguous

  • Internal Fragmentation: allocated memory may be slightly larger than

requested memory; this size difference is memory internal to a partition, but not being used –

  • We can reduce external fragmentation by compaction

– shuffle memory contents to place all free memory blocks together in

  • ne large block

– compaction is possible only if relocation is dynamic, and is done at execution time – I/O problem: what if DMA is being done to some memory address in a block we want to move? Soln 1: lock job in memory while it is involved in I/O Soln 2:

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-20
SLIDE 20

Chapter 8 218

Dealing with Fragmentation

  • There are two methods to dealing with fragmentation:

– – paging

  • These two methods can be combined
  • The next few slides discuss these methods

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-21
SLIDE 21

Chapter 8 219

Segmentation

  • Segmentation is a memory-management scheme that supports the user

view of memory

  • A program is a collection of segments
  • A segment is a logical unit such as:

— main program — procedure/function/method — object — local variables, global variables — common block — stack — symbol table — array

  • A user’s view of a program → → → → →

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-22
SLIDE 22

Chapter 8 220

Logical View of Segmentation

User Space View Physical Memory Space 1 4 3 2 1 2 3 4

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-23
SLIDE 23

Chapter 8 221

Segmentation Architecture: 1

  • Logical address consists of a two tuple:

<segment-number, offset>

  • Segment table: maps two-dimensional physical addresses;

each table entry has: – base: contains the starting physical address where the segment resides in memory – limit: specifies the length of the segment

  • The Segment-Table Base Register (STBR) points to the segment

table’s location in memory

  • The Segment-Table Length Register (STLR) indicates the number of

segments used by a program –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-24
SLIDE 24

Chapter 8 222

Segmentation Architecture: 2

  • Protection:

– – a validation bit; validation bit == 0 ⇒ illegal segment – read/write/execute privileges

  • Protection bits are associated with segments; code sharing occurs at

the segment level – e.g., the reentrant (q.v.) code for an editor would be in one segment – reentrant code is “pure” instructions with no embedded (writable) data

  • Since segments vary in length, memory allocation is a dynamic

storage-allocation problem

  • A segmentation example is shown in the diagram in the next slide

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-25
SLIDE 25

Chapter 8 223

Segmentation Hardware

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-26
SLIDE 26

Chapter 8 224

Example of Segmentation

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-27
SLIDE 27

Chapter 8 225

Segmentation: Issue

  • A problem with segmentation is that we still have a dynamic memory

allocation problem

  • That is, what happens if a program needs a 200 MB segment and there

is no contiguous free block of memory ≥ 200 MB?

  • On to Plan C. . .

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-28
SLIDE 28

Chapter 8 226

Paging

  • Using a system in which the memory space for a given process is

contiguous has problems; paging comes to the rescue: (1) Allow the physical address space of a process to be non-contiguous; the process is allocated physical memory wherever the latter is available (this is a bit like segmentation) (2) Divide physical memory into fixed-sized blocks called frames (their size is a power of 2, often between 512 bytes and 8,192 bytes) (3) (4) Keep track of all free frames (5) To run a program of size n pages, the OS needs to find n free frames and load program (6)

  • Internal fragmentation remains a problem, but a relatively minor one

– GEQ: does frame size affect internal fragmentation? – if so, is internal fragmentation worse for small or big pages?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-29
SLIDE 29

Chapter 8 227

Address Translation Scheme

  • The address generated by the CPU is divided into:

– – page offset (d): combined with base address to define the physical memory address that is sent to the memory unit

  • Suppose the logical address space is 2m and the page size is 2n:

page number page offset

m − n n p d

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-30
SLIDE 30

Chapter 8 228

Paging Hardware

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-31
SLIDE 31

Chapter 8 229

Paging Model of Logical and Physical Memory

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-32
SLIDE 32

Chapter 8 230

Paging Example: 32-Byte Memory, 4-Byte Pages

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-33
SLIDE 33

Chapter 8 231

Keeping Track of Free Frames

Before allocation After allocation

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-34
SLIDE 34

Chapter 8 232

Implementation of Page Table

  • The page table is kept in main memory (it is too big for registers,

except on machines with very small addr spaces)

  • The page-table base register (PTBR) points to the page table
  • The page-table length register (PTLR) indicates the size of the page

table

  • In this scheme every data/instruction access requires two memory

accesses: one for the page table and one for the data/instruction –

  • The two-memory-access problem can be solved by the use of a special

fast-lookup hardware cache called associative memory or translation look-aside buffers (TLBs)

  • Some TLBs store address-space identifiers (ASIDs) in each TLB entry:

an ASID uniquely identifies each process to provide address-space protection for that process

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-35
SLIDE 35

Chapter 8 233

Paging Hardware With TLB

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-36
SLIDE 36

Chapter 8 234

Effective Access Time

  • Let the associative look up time be ǫ
  • Assume the memory cycle time is 1 microsecond
  • The hit ratio is the percentage of time that a page number is found in

the associative registers –

  • Let 0 ≤ α ≤ 1 be the hit ratio
  • Then the Effective Access Time (EAT ) is given by

EAT = (1 + ǫ)α + (2 + ǫ)(1 − α) = 2 + ǫ − α

– if α ≈ 1 then EAT ≈ 1 + ǫ, which is acceptable – if α ≈ 0 then EAT ≈ 2 + ǫ, which is Very Very Bad

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-37
SLIDE 37

Chapter 8 235

Memory Protection

  • Memory protection is implemented by associating a protection bit with

each frame

  • A valid-invalid bit is attached to each entry in the page table:

– “valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page – “invalid” indicates that the page is not in the process’ logical address space

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-38
SLIDE 38

Chapter 8 236

Valid (v) or Invalid (i) Bit In A Page Table

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-39
SLIDE 39

Chapter 8 237

Sharing Pages

  • Idea: many copies of a given program might be running at once

– – suppose you have 50 people concurrently using a given editor and the sharable part of the code is 1 MB – sharing the code amongst these processes would save 49 MB

  • Shared code

– – reentrant: code is “pure” → no modifiable data

  • Private code and data

– each process keeps a separate copy of the code and data – the pages for the private code and data can appear anywhere in the logical address space

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-40
SLIDE 40

Chapter 8 238

Example of Shared Pages

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-41
SLIDE 41

Chapter 8 239

Page Table Implementation

  • Issue: if the page size is, say, 212 bytes, and the address space is 32 bits,

then a page table could have 220 entries –

  • It is undesirable to allocate the page table contiguously in memory
  • One solution: divide the page table into smaller pieces
  • For example, have a hierarchical page table

– two or more levels in the hierarchy

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-42
SLIDE 42

Chapter 8 240

Two-Level Page-Table Scheme

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-43
SLIDE 43

Chapter 8 241

Two-Level Paging Example

  • A logical address (on a 32-bit machine with 1K page size) is divided into:

– – a page offset consisting of 10 bits

  • Since the page table is itself paged, the page number is further divided

into: – – a 10-bit page offset

  • Thus, a logical address looks as follows:

page number page offset 12 10

10 p1 p2 d

where p1 is an index into the outer page table, and p2 is the displacement within the page of the (inner) page table

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-44
SLIDE 44

Chapter 8 242

Two-Level Address-Translation Scheme

page number page offset 12 10

10 p1 p2 d p1 is an index into the outer page table, and p2 is the displacement

within the page of the (inner) page table

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-45
SLIDE 45

Chapter 8 243

Three-Level Paging Scheme

  • Suppose you have a 64-bit address space and 4K pages, and try the

same scheme; you get:

  • uter page

inner page page offset 42 10 12

p1 p2 d

The outer page table has 242 entries; this is way too big

  • Solution(?): use a three-level scheme:

2nd outer page

  • uter page

inner page page offset 32 10 10 12

p1 p2 p3 d

The outer page table is still too big. . . could go to a 5-level scheme – this would require a lot of page table accesses for each actual memory access: too inefficient – for a very large address space, hierarchical schemes don’t work

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-46
SLIDE 46

Chapter 8 244

Hashed Page Tables

  • Idea: use a hash table to look up addresses

  • The virtual page number is hashed into a page table

– this page table contains a chain of elements hashing to the same location

  • Virtual page numbers are compared in this chain searching for a match
  • If a match is found, the corresponding physical frame is extracted

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-47
SLIDE 47

Chapter 8 245

Inverted Page Table

  • One entry for each real frame of memory
  • Entry consists of the virtual address of the page stored in that real

memory location, with information about the process that owns that page

  • Decreases memory

needed to store each page table, but increases time needed to search the table when a page reference occurs

  • Use a hash table to

limit the search to

  • ne (or at most a

few) page-table entries

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-48
SLIDE 48

Chapter 8 246

Example: The Intel Pentium (32 bit)

  • Supports both segmentation and segmentation with paging
  • CPU generates a logical address

– – the segmentation unit produces a linear address – the linear address is given to the paging unit –

  • The segmentation and paging units form the equivalent of the MMU
  • The logical address is composed of a 16-bit “selector” plus a 32-bit
  • ffset

– the selector specifies a segment and two protection bits – each segment can be up to 4 GB

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-49
SLIDE 49

Chapter 8 247

Intel Pentium Segmentation

  • Up to 16K segments per process(!) ⇒ up to 8K local (i.e., private to

that process), up to 8K global (i.e., shared) – but there are only 6 segment registers, so only 6 segments can be used at a time! (there are also 6 registers to hold the corresponding segment table entries (“descriptors”), saving memory accesses)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-50
SLIDE 50

Chapter 8 248

Pentium Paging Architecture

  • The CR3 register

points to this process’ outer page table

  • Information in
  • uter page table

indicates whether it points to an inner page table

  • r a 4 MB frame
  • Info in outer page

table indicates whether inner table is in memory

  • r swapped out

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-51
SLIDE 51

Chapter 8 249

Three-level Paging in Linux

  • Linux runs on many architectures, including (duh) 64-bit machines

– –

  • nly uses 6 segments, matching what Pentium has to offer
  • Linux breaks the

linear addr into 4 parts, as in the figure – but on the Pentium (only 2-level paging) the middle directory is 0 bits

  • Each task in Linux has its own set of page tables; the contents of the

CR3 register forms part of the task’s context

  • 3 level? Ha! Archaic. See https://lwn.net/Articles/708526/

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-52
SLIDE 52

Chapter 9

Virtual Memory

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-53
SLIDE 53

Chapter 9 250

Background

  • Virtual memory: separation of user logical memory from physical

memory. – – the logical address space can therefore be much larger than the physical address space – allows address spaces to be shared by several processes –

  • Virtual memory can be implemented via:

– – demand segmentation

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-54
SLIDE 54

Chapter 9 251

Virtual Memory That is Larger Than Physical Memory

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-55
SLIDE 55

Chapter 9 252

Virtual Address Space

  • (Where have we seen this before??)
  • This is (sort of) how the programmer (compiler

writer?) sees the address space of a program –

  • Also note that in some machines, the stack

could grow from low addresses to high addresses –

  • The “hole” in the address space is not needed

until the heap or stack grows into that space

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-56
SLIDE 56

Chapter 9 253

Library Sharing Using Virtual Memory

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-57
SLIDE 57

Chapter 9 254

Demand Paging

  • Idea: bring a page into memory only when it is needed

– – faster response on program start-up – less memory needed —or— more tasks can be in memory at once

  • When a page is needed, examine the page’s reference

– – valid, but not in memory ⇒ bring into memory

  • Recall a scheduler might swap a task out to disk:

– lazy swapper: never swaps a page into memory unless that page will be needed

  • A swapper that deals with pages, rather than entire processes, is a

“pager” – “swapper” is technically incorrect in this context

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-58
SLIDE 58

Chapter 9 255

Transfer of a Paged Memory to Contiguous Disk Space

  • A demand paging

system is (vaguely?) similar to a (non-demand) paging system in which entire processes are swapped out to disk

  • This picture shows

contiguous regions

  • f in-memory pages

being mapped to contiguous blocks on disk

  • GEQ: why would we care about contiguousness?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-59
SLIDE 59

Chapter 9 256

Page Fault

  • Q: what happens if a memory access refers to a page not currently in

main memory?

  • If there is a reference to such a page, the paging h/w generates an

interrupt, trapping to the operating system: this is called a page fault (i) The operating system looks at another table to decide whether the page fault is: (a) an invalid reference ⇒ abort (b) (ii) If just not in memory, (a) find (or create!) an empty frame (b) (c) reset tables (d) set validation bit = v (e) restart the instruction that caused the page fault (!!)

  • Restarting the instruction could be complex, if the instruction had

already modified registers or memory locations before the page fault

  • ccurred

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-60
SLIDE 60

Chapter 9 257

Steps in Handling a Page Fault

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-61
SLIDE 61

Chapter 9 258

Valid-Invalid Bit

  • A “valid-invalid” bit is associated with each page table entry (“v”:

in-memory, “i”: not-in-memory)

  • Initially the valid-invalid bit is set to i on all entries
  • Example of a page table snapshot:

Frame # Valid-invalid bit xxx

v

yyy

v

zzz

v

www

i . . . . . .

aaa

i

bbb

i

  • During address translation, if the valid-invalid bit in the page table entry

is “i”, a page fault is generated – check out the output of

time <some command>

under tcsh and look for the <n>pf info (make sure you run some (preferably big) program which you have not run since you last booted)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-62
SLIDE 62

Chapter 9 259

Page Table When Some Pages Are Not in Main Memory

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-63
SLIDE 63

Chapter 9 260

Performance of Demand Paging

  • Let the page fault rate be 0 ≤ p ≤ 1

– – if p = 1, every memory reference causes a page fault

  • Effective Access Time (EAT ):

EAT = (1 − p) × memory access time + p × ( page fault overhead + swap page out time + swap page in time + restart overhead )

  • The page fault overhead is huge! (See the textbook)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-64
SLIDE 64

Chapter 9 261

Demand Paging Example

  • Suppose the memory access time = 200 nanoseconds
  • Suppose the average page-fault service time = 8 milliseconds

(the textbook author has something faster than a slow spinny disk!)

  • Then

EAT = (1 − p) × 200 ns + p × 8 ms = (1 − p) × 200 ns + p × 8, 000, 000 ns = 200 ns + p × 7, 999, 800 ns

  • If one access out of 1,000 causes a page fault, then

EAT = 8.2 µs

  • Conclusion: the overall page fault rate had better be very, very small
  • See the arithmetic in the textbook:

for performance degradation to be < 10%, we need p < 0.0000025 –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-65
SLIDE 65

Chapter 9 262

Process Creation: Copy on Write

  • Virtual memory and demand paging combine to provide a significant

benefit for process creation

  • Consider fork() (which is “usually” followed by exec()):

a BFI implementation would require that the entire address space of the process be copied – archaic versions of Unix (running on 16-bit computers) did exactly this, by swapping the whole process out of memory and back in twice

  • Instead, to create the child process

– – mark all of the pages “copy on write”

  • When either the child or the parent does a write to a page, the page is

then copied to a free frame and the process’ page table is adjusted to point to the new frame GEQs: should the other process still do a copy on write for that page?

. . . If so, how do we know when the original frame is free?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-66
SLIDE 66

Chapter 9 263

Modification of a Copy-On-Write Page

  • Before:
  • After:

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-67
SLIDE 67

Chapter 9 264

Dirty Cow!

  • https://dirtycow.ninja

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-68
SLIDE 68

Chapter 9 265

Finding Free Frames

  • When a page fault happens, or when a write to a copy-on-write page
  • ccurs, a new frame is required
  • The OS can maintain a list of free frames

  • Q: What if there are no free frames?

A: Must select a page to be removed from main memory & page it out –

  • Q: How do we select the page to remove?

A: That’s a good question. . .

  • Note: to increase efficiency, associate a dirty bit with each page in

memory – the dirty bit is set iff the page has been written to – if the dirty bit is clear when a page is to be removed, it doesn’t need to be copied to disk (it is already there!)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-69
SLIDE 69

Chapter 9 266

Basic Page Replacement

When a page fault happens in process P : (1) Find the location of the desired page on disk (2) Find a frame to use: if there is a free frame, use it if there is no free frame, use a page replacement algorithm to select a victim frame – (3) Bring the desired page into the (possibly newly) free frame; update the page and frame tables (4) Restart process P

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-70
SLIDE 70

Chapter 9 267

Page Replacement

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-71
SLIDE 71

Chapter 9 268

Page Replacement Algorithms

  • Desire: an algorithm which provides the lowest page-fault rate

– e.g., we don’t want to remove a page which will be used by the next instruction

  • We can evaluate an algorithm by

– running it on a particular string of memory references (the reference string) and – computing the number of page faults on that string

  • We will use two reference strings in our examples:

1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 and 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-72
SLIDE 72

Chapter 9 269

“Generic” Graph of the Number of Page Faults

  • vs. the Number of Frames

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-73
SLIDE 73

Chapter 9 270

First-In-First-Out (FIFO) Page Replacement Algorithm

  • Assume 3 frames are available for this process
  • When all frames are used, the page which has been in memory the

longest is chosen as the victim

  • (Of course?) no page is removed when the desired page is currently in

memory

  • 15 page faults in total for this reference sequence

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-74
SLIDE 74

Chapter 9 271

First-In-First-Out (FIFO) Page Replacement Algorithm

  • Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
  • 3 frames (3 pages can be in memory at a time per process):

Page number Frame 1 2 3 1 2 3 4 1 2 5 3 4

⇒ 9 page faults in total

  • 4 frames:

Page number Frame 1 2 3 4 1 2 3 4 5 1 2 3 4 5

⇒ 10 page faults in total !

  • Belady’s Anomaly: more frames ⇒ more page faults

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-75
SLIDE 75

Chapter 9 272

FIFO Illustrating Belady’s Anomaly

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-76
SLIDE 76

Chapter 9 273

Optimal Page Replacement Algorithm

  • Idea: replace the page that will not be used for the longest period of

time from now

  • Example: suppose there are 4 frames, and the reference string is

1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Page number Frame 1 2 3 4 1 2 3 4 5 4

⇒ 6 page faults in total

  • But wait! In real life, how do you know which page won’t be accessed

for the longest period of time? –

  • But. . . this algorithm can be used for measuring how well other

algorithms perform

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-77
SLIDE 77

Chapter 9 274

Optimal Page Replacement

  • 9 page faults in total for this reference sequence

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-78
SLIDE 78

Chapter 9 275

Least Recently Used (LRU) Algorithm

  • Idea: remove the page that has been used the least recently
  • Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5

Page number Frame 1 2 3 4 1 2 3 4 5 3 4 5

⇒ 8 page faults in total (better than FIFO, worse than optimal)

  • Implementation using a counter:

– every page entry has a counter; every time a page is referenced through this entry, copy the clock into the counter – need h/w support for this? – when a page needs to be removed, look at the counters to determine the victim

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-79
SLIDE 79

Chapter 9 276

LRU Page Replacement

  • 12 page faults in total for this reference sequence

– – 133% as many as optimal

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-80
SLIDE 80

Chapter 9 277

LRU Algorithm: Another Implementation

  • “Stack” implementation: keep a “stack” of page numbers in a

doubly-linked list

  • When a page is referenced:

– move it to the top of the “stack” – this requires 5 or 6 pointers to be changed

  • No search for replacement is required: algorithm runs in O(1) time

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-81
SLIDE 81

Chapter 9 278

LRU Approximation Algorithms

  • Reference bit

– – When a page is used set its reference bit to 1 – Replace a page whose reference bit is 0 (if such a page exists) – this is a very crude approximation to LRU – need to deal with situation where no reference bits are 0 (e.g., set all reference bits to 0 periodically)

  • Second chance: a somewhat better approximation to LRU

– – Keep a circular list of all pages and a pointer into that list – When a victim is needed, search through the list – if a page has reference bit = 1 then set its reference bit to 0 but leave it in memory – if a page has reference bit = 0 use it as the victim

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-82
SLIDE 82

Chapter 9 279

Second-Chance Page-Replacement Algorithm

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-83
SLIDE 83

Chapter 9 280

Counting Algorithms

  • Keep a counter of the number of references that have been made to

each page –

  • LFU Algorithm: remove the page with the smallest count

  • MFU Algorithm: remove the page with the largest count

– intuitively sensible??? – based on the argument that the page with the smallest count was probably just brought in and has yet to be used very much – which principle suggests that it will be used more in the near future?

  • GEQ: should the count be since the start of time or since the page was

most recently paged in? Why?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-84
SLIDE 84

Chapter 9 281

Frame Allocation

  • Each process needs some minimum number of pages to run efficiently
  • Consider a machine which allows

ADDL A, B, C

where A, B and C are all memory locations – 4 different pages could be referenced by that one instruction! –

  • r 7 if the operands all overlapped page boundaries!

  • r 8 if the instruction overlapped a page boundary as well

– if this process only had 6 frames allocated to it, this instruction could wreak havoc – – Q: how do we decide how many frames should be allocated to a given process?

  • Two major allocation schemes are used:

– – priority allocation

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-85
SLIDE 85

Chapter 9 282

Fixed Allocation

  • Equal allocation: for example, if there are 100 frames and 5 processes,

give each process 100/5 = 20 frames.

  • Proportional allocation: allocate according to the size of each process

Let: –

si = size (in pages) of process pi

S = si

m = total number of frames available

ai = allocation for pi: ai = si S × m

– e.g., if m = 64, s1 = 10 and s2 = 127, then

m = 64 s1 = 10 s2 = 127 a1 = 10 137 × 64 ≈ 5 a2 = 127 137 × 64 ≈ 59

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-86
SLIDE 86

Chapter 9 283

Priority Allocation

  • Use a proportional allocation scheme using priorities rather than size
  • If process Pi generates a page fault,

– – select for replacement a frame from a process with lower priority number

  • Would need to ensure that even low-priority processes get to keep some
  • f their frames!

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-87
SLIDE 87

Chapter 9 284

Global vs. Local Allocation

  • Global replacement: process selects a replacement frame from the set of

all frames; one process can take a frame from another

  • Local replacement: each process selects only from its own set of

allocated frames

  • A disadvantage of global replacement is that a process’ number of page

faults (and thus its runs time) could vary wildly depending on what

  • ther tasks are running on the system
  • However, allowing a process to use frames (belonging to another

process) which see little use can provide overall benefits –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-88
SLIDE 88

Chapter 9 285

Thrashing

  • If a process does not have “enough” pages, the page-fault rate may

become very high. This leads to: – – which could make the operating system think that it needs to increase the degree of multiprogramming – which makes it add another process to the system –

  • Thrashing: one or more processes is/are busy swapping pages in and
  • ut, but it/they is/are not getting much actual work done
  • On systems without paging, entire segments or even processes would

have to be swapped, further increasing the problem

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-89
SLIDE 89

Chapter 9 286

CPU Utilization vs. Multiprogramming When Thrashing May Occur

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-90
SLIDE 90

Chapter 9 287

Demand Paging and Thrashing

  • Q: Why does demand paging work?

A: Locality of reference – – localities may overlap

  • Q: Why does thrashing occur?

A: The sum of the processes’ locality sizes is > total memory size

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-91
SLIDE 91

Chapter 9 288

Locality In A Memory-Reference Pattern

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-92
SLIDE 92

Chapter 9 289

Working-Set Model

  • Define a parameter ∆ ≡ “working-set window” ≡ a fixed number of

page references Example: 10,000 instructions

  • Let WSSi (working set size of Process Pi) = total number of pages

referenced in the most recent ∆ page references – if ∆ is too small, it will not encompass the entire locality – if ∆ is too large, it will encompass several localities – if ∆ = ∞ the working-set will encompass the entire program

  • Let D =

i WSSi be the total demand for frames

  • if D > m (recall m is the total # of frames available) thrashing occurs
  • Policy: if D > m, then suspend (swap out) one of the processes

– this keeps the degree of multiprogramming as high as possible without thrashing – thus CPU utilization is kept high

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-93
SLIDE 93

Chapter 9 290

Working-set Model Example

  • Example: let ∆ = 10
  • Note that the size of a process’ working set can change considerably
  • ver time

Jim Diamond, Jodrey School of Computer Science, Acadia University