Operating Systems Design and Implementation Chapter 04 (version - - PDF document

▶

Jan 01, 2023 6 likes •548 views

Operating Systems Design and Implementation Chapter 04 (version January 30, 2008 ) Melanie Rieback Vrije Universiteit Amsterdam, Faculty of Sciences Dept. Computer Science Room R4.23. Tel: (020) 598 7874 E-mail: melanie@cs.vu.nl, URL:

SLIDE 1

Operating Systems

Design and Implementation

Chapter 04

(version January 30, 2008)

Melanie Rieback

Vrije Universiteit Amsterdam, Faculty of Sciences

Dept. Computer Science

Room R4.23. Tel: (020) 598 7874 E-mail: melanie@cs.vu.nl, URL: www.cs.vu.nl/∼melanie/

01 Introduction 02 Processes 03 Input/Output 04 Memory Management 05 File Systems

00 – 1 /

SLIDE 2

Memory Management

Simple memory management
Swapping
Virtual memory
Page replacement
Design issues for paging systems
Segmentation
Memory management in MINIX

04 – 1 Memory Management/

SLIDE 3

Memory Management Simple Model

Idea: Assume there is just a single process at a time in main memory (the MS-DOS model). Then, there is hardly any memory to be managed:

(a) (b) (c) 0xFFF … User program User program User program Operating system in RAM Operating system in RAM Operating system in ROM Device drivers in ROM

04 – 2 Memory Management/4.1 Simple Model

SLIDE 4

Memory Management Multitasking Model

Idea: just have more than one process in main mem-

ry. It’s good for CPU utilization.

0.2 0.4 0.6 0.8 1 2 4 6 8 10 CPU utilization Number of processes 0.2 0.5 0.8

Question: what can we say about the relation memory– cpu utilization (p is fraction spent in I/O wait)? Overall conclusion: having several processes in mem-

ry simultaneously (and that have things to do), is a

good idea.

04 – 3 Memory Management/4.1 Simple Model

SLIDE 5

Program Relocation (1/2)

Problem: After compiling and linking programs, we don’t want to have addresses hardcoded into the bi- nary: this will only allow us to place the program at a fixed location in memory. Idea: Use relative addressing: all addresses in a bi- nary are relative to the start address after placing it in memory. Solution: get hardware support (changing addresses at load time is not really the thing to do).

04 – 4 Memory Management/4.1 Simple Model

SLIDE 6

Program Relocation (2/2)

1400 1400 + 8965 2304 100 1500 base register MOVE 100,D0

04 – 5 Memory Management/4.1 Simple Model

SLIDE 7

Program Protection (1/2)

Problem: we can have some rather nasty problems:

nasty: mov A0, #end ; Move the address of ; last instruction ; into register A0 loop: mov (A0), #0 ; Store 0 at location ; identified by A0 add A0, #1 ; Increment address ; in A0 jmp loop ; Loop forever end: ret ; Never reached

04 – 6 Memory Management/4.1 Simple Model

SLIDE 8

Program Protection (2/2)

Solution: make use of limit register.

1400 1400 + 8965 2304 100 > > 12000 base register limit register

k
k
ut-of-limit signal

not ok MOVE 100,D0 12000

04 – 7 Memory Management/4.1 Simple Model

SLIDE 9

Fragmentation: Problem

Problem: it may be that there are more processes available than memory can handle ⇒ swap processes between primary and secondary memory. This may lead to memory fragmentation. program P1 P2 P3 P4 P5 size 1024 595 320 560 482 completion time 2 3 4 4 2

560 1024 1506 1619 1939 2048

step 1 step 2 step 3 P

step 1 step 2 step 3 time

Question: What is wrong with this example?

04 – 8 Memory Management/4.1 Simple Model

SLIDE 10

Fragmentation: Solution

Move all running processes so that they are adja-

cent, leaving one big chunk of free memory (mem-

ry compaction). Not really doable.
Keep track of adjacent holes of free memory and

merge them into one. Note that this is done in software.

allocated memory free memory chunks allocated memory being returned time

04 – 9 Memory Management/4.1 Simple Model

SLIDE 11

Memory Organization per Process

Idea: you have to be aware of processes growing and shrinking ⇒ take this into account when allocat- ing memory.

(a) (b) Operating system Room for growth Room for growth B-Stack A-Stack B-Data A-Data B-Program A-Program Operating system Room for growth B A Actually in use Room for growth Actually in use

Question: Is it possible that we can have a dynamic allocation of memory? How should this work? How does this compare to malloc() and free()?

04 – 10 Memory Management/4.2 Swapping

SLIDE 12

MM – Bit Maps

Problem: we want to keep track (in software) which part of memory is free to allocate to processes.

Divide memory into units of blocks, e.g. each block

consists of 4 consecutive bytes.

Use a bitmap that indicates if a block is allocated
r not. Note that each bitmap value requires only

a single bit: 0 → the block is free; 1 → the block is allocated.

When k consecutive blocks need to be allocated,

search for k consecutive 0’s in the bitmap. Question: Where do we store the bitmap? Note: Pretty efficient: if the block size is B bytes, we’ll need only 1/(8B)-th of the total memory for the bitmap. Question: What’s the real disadvantage?

04 – 11 Memory Management/4.2 Swapping

SLIDE 13

MM – Linked Lists (1/2)

Idea: keep track of holes by means of a linked list:

Start address Length Next hole 5 100 224 50 496 29 Start address Length Next hole 5 100 175 25 496 29 224 50 Start address Length Next hole 5 125 175 25 496 29 224 50 Start address Length Next hole 5 224 50 496 29 195

Question: What’s happened here?

04 – 12 Memory Management/4.2 Swapping

SLIDE 14

MM – Linked Lists (2/2)

Idea: If you use linked lists you can decide on the “best” hole that you can use for allocation:

First fit – first hole big enough is used.
Next fit – keep track of last allocation; next hole

big enough is used.

Best fit – choose the smallest hole that is big

enough for allocation.

Worst fit – choose the largest hole that is big

enough.

Quick fit – maintain lists that correspond to fre-

quently occurring requests. Question: How relevant do you think all this mumbo jumbo is?

04 – 13 Memory Management/4.2 Swapping

SLIDE 15

Paged Memory Systems (1/2)

Problem: So far, it is only possible to place a pro- cess into memory when there is a contiguous piece

f memory available ⇒ allocation may succeed after

a long time. Solution: Divide memory and the program into pages, and just check if there are enough free pages for the

program. Each page has a fixed size (say 256 bytes).

Needed: A mapping of physical pages (also called page frames) to logical pages.

Assume a program is divided into 9 logical pages

L0, ...,L8.

Main memory is divided into 256-byte page frames.

Assume that P6 is the first free page frame ⇒ use P6 to store data and instructions contained in L0.

Use the next free page frame for L1, etc.

04 – 14 Memory Management/4.3 Virtual Memory

SLIDE 16

Paged Memory Systems (2/2)

Note: We need to do something about address trans- lation:

In the program we use a logical address, say 26251.

Assume page sizes are 256 = 28 bytes. We ex- tract the logical page number and page offset as: plog = 26251 div 28 = 102

26251 mod 28 = 139

The page number is looked up in the page table

to find the page frame number pphys.

The physical address is obtained as pphys∗28+o.
Bad news: we have to do a table lookup for every

memory reference.

Good news: table lookups can be done very effi-

ciently because we (1) can easily derive the log- ical page number, and (2) we get some support from the hardware.

04 – 15 Memory Management/4.3 Virtual Memory

SLIDE 17

Paged Systems Hardware Support (1/2)

Note: Deriving the logical page number is easy: 26251 → 000001100110

10001011

Hardware: implementation of the page table (as part

f the Memory Management Unit.

04 – 16 Memory Management/4.3 Virtual Memory

SLIDE 18

Paged Systems Hardware Support (2/2)

1634

page table register 12 bits 8 bits

0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 1 1 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 0 0 1 1 1 0 1 1 0 1 1 0 1 0 1 2 3 4 5 6 7 8 (6) (29) (67) (289) (723) (1879) (2643) (2644) (3802) 0 0 0 1 0 0 1 0 0 0 0 1 1 1 1 0 1 0 0 0

8 bits 12 bits

04 – 17 Memory Management/4.3 Virtual Memory

SLIDE 19

Multilevel Page Tables

Essence: Instead of having a single, huge page table in memory, we divide the page table into parts using an additional level of indirection.

(a) (b) Top-level page table Second-level page tables To pages Page table for the top 4M of memory 6 5 4 3 2 1 1023 6 5 4 3 2 1 1023 Bits 10 10 12 PT1 PT2 Offset

04 – 18 Memory Management/4.3 Virtual Memory

SLIDE 20

Lookaside Buffers

Problem: If we really have to translate every address to a page frame, we’re loosing a lot of performance. Therefore, keep a number of translations separate in a translation lookaside buffer. Essence: Instead of going to the page table, we check a virtual address with all entries in the TLB in parallel. Assumption: Most programs exhibit a high degree of locality: they use just a few pages (working set).

04 – 19 Memory Management/4.3 Virtual Memory

SLIDE 21

Inverted Page Tables (1/2)

Observation: If we have a virtual address space of 2w bytes and 2p bytes per page, we need a total of 2w−p entries in a page table. For w = 32 and p = 12, having 1048576 entries is OK (= 4MB). With w = 64, p = 12, we have a 33 GB problem to solve. Solution: Store per page frame a (process, virtual page) pair. Problem: Every address needs to be looked up ⇒ use TLB in combination with a hash table storing the most often used virtual addresses.

04 – 20 Memory Management/4.3 Virtual Memory

SLIDE 22

Inverted Page Tables (2/2)

Consider w = 64 and p = 12 ⇒ we need 252 entries:

Traditional page table with an entry for each of the 252 pages 256-MB physical memory has 216 4-KB page frames Hash table 216 -1 252 -1 216 -1 Indexed by virtual page Indexed by hash on virtual page Virtual page Page frame

Hash table: virtual address → hash(virtual address) → (virtual page, page frame) → lookup(page frame)

04 – 21 Memory Management/4.3 Virtual Memory

SLIDE 23

Virtual Memory (1/2)

So far:

Main memory is considered as a large linear ad-

dress space. Each address corresponds with a physical location.

The address space is partitioned into page frames.

Each address is translated as a (page,offset)-pair. Idea: Why not assume an arbitrary address space to be available that is divided into logical pages. A logical page is then mapped to a page frame ... or not, when there’s no frame available.

04 – 22 Memory Management/4.3 Virtual Memory

SLIDE 24

Virtual Memory (2/2)

in memory 16 bits 8 bits 8 bits 12 bits

Question: What happens when a page is not in mem-

04 – 23 Memory Management/4.3 Virtual Memory

SLIDE 25

Page Replacement (1/2)

Problem: When a logical page is not in memory, we need to fetch it from swap space, and map it onto a page frame. This may imply that another logical page needs to be replaced. Question: What would be the best choice? Idea: keep track of the usage of pages, and choose

ne that isn’t being used much. Assumption: it will

also not be used much in the future. Solution: Have the hardware maintain two bits:

The R bit (referenced) indicates whether the data

was accessed.

The M bit (modified) indicates whether the page

has been modified. Question: What could R = 0,M = 1 mean? Why do we need the M bit?

04 – 24 Memory Management/4.4 Page Replacement

SLIDE 26

Page Replacement (2/2)

Problem: use a simple algorithm that can be used to select the page to swap out.

Not recently used: Based on the R and M bits:

class R bit M bit comment 0: good candidate 1: 1

ld page

2: 1 referenced, but okay 3: 1 1 bad candidate

FIFO: keep track of when pages were swapped
in. The page longest in memory is replaced. Not
ften that good (why not?). Solution: also take

the R and M bits into account.

Second chance: Check the oldest page; if refer-

enced, clear the R bit and move the page to the end of the FIFO-list. What are we doing here?

Least-recently used: take the page that has not

been used for a long time. Problem: how can we maintain an ordered list?

04 – 25 Memory Management/4.4 Page Replacement

SLIDE 27

Least Recently Used

Problem: If we want to keep track of LRU pages, we have to maintain a list that needs to be updated on every memory reference. Not a very cheap operation. Hardware doesn’t really help (see book).

Page 1 2 3 4 5 R bits for pages 0-5, clock tick 0 10000000 00000000 10000000 00000000 10000000 10000000 1 0 1 0 1 1 (a) R bits for pages 0-5, clock tick 1 11000000 10000000 01000000 00000000 11000000 01000000 1 1 0 0 1 0 (b) R bits for pages 0-5, clock tick 2 11100000 11000000 00100000 10000000 01100000 10100000 1 1 0 1 0 1 (c) R bits for pages 0-5, clock tick 3 11110000 01100000 00010000 01000000 10110000 01010000 1 0 0 0 1 0 (d) R bits for pages 0-5, clock tick 4 01111000 10110000 10001000 00100000 01011000 00101000 0 1 1 0 0 0 (e)

Software aging checks at each clock tick whether a page has been referenced. It keeps, per page, a shift register of which the leftmost bit is filled with R.

04 – 26 Memory Management/4.4 Page Replacement

SLIDE 28

Page Table Entry

This leaves us with the following (general) format for a page table entry:

Caching disabled Modified Present/absent Page frame number Referenced Protection

Question: What’s the use of disabling caching of a page?

04 – 27 Memory Management/4.4 Page Replacement

SLIDE 29

Design Issues Paging Systems

Recall: basic paging memory model: 1: Translate a logical address addr to a (page,offset) pair where page corresponds to a physical page. 2: If page == NULL, you’ll have to fetch that part of the program that addr is referring to, from sec-

ndary storage.

3: Fetching a page from swap space, generally im- plies that some other page has to be replaced. Note: this is a demand paging strategy: a page is fetched only when it is really needed to continue pro- gram execution (i.e. on demand). Problem: if you throw out the “wrong” page, you may find yourself continuously fetching pages from swap space ⇒ thrashing. Question: What’s a “wrong” page?

04 – 28 Memory Management/4.5 Design Issues Paging Systems

SLIDE 30

Working Set Model

Definition: the working set of a process is the set of pages a process currently needs to execute. Ideally: what you want is to have the working set in memory when the process starts execution. Suggests: loading pages from the process’ working set in advance, i.e. adopt a prepaging strategy. Problem: The working set generally will not fit entirely into main memory ⇒ you have to maintain a process’ “best” pages in memory, and drop those that are not used often. Question: (1) What’s a “best” page, (2) what’s a “not

ften used” page?

Design issue: do we search for a replaceable page in the set of pages belonging to the process, or do we choose from the set of all pages in memory?

04 – 29 Memory Management/4.5 Design Issues Paging Systems

SLIDE 31

Allocation Policies (1/2)

Problem: Suppose we need to swap in a page for A, but have to make space available in memory. Do we replace one of A’s pages, or will any page do?

(a) (b) (c) A0 A1 A2 A3 A4 A5 B0 B1 B2 B3 B4 B5 B6 C1 C2 C3 A0 A1 A2 A3 A4 A6 B0 B1 B2 B3 B4 B5 B6 C1 C2 C3 A0 A1 A2 A3 A4 A5 B0 B1 B2 A6 B4 B5 B6 C1 C2 C3 Age 10 7 5 4 6 3 9 4 6 2 5 6 12 3 5 6

Solution: use global allocation: it works better in avoid- ing thrashing.

04 – 30 Memory Management/4.5 Design Issues Paging Systems

SLIDE 32

Allocation Policies (2/2)

Load control: One reason why many page faults may

ccur is that there are too many processes in memory,

so that each one cannot be allocated enough pages from its working set. Solution: reduce the number of concurrent processes.

Page faults/sec Number of page frames assigned A B

A: Not enough page frames assigned to processes; B: Too many leads to waste of memory.

04 – 31 Memory Management/4.5 Design Issues Paging Systems

SLIDE 33

Choosing the Page Size

Note: the operating system can choose a page size “independent” of the page size chosen by the hard- ware (how?).

Small is good: Good in order to avoid internal
fragmentation. On average, half of the last page

will be wasted. With n segments, this adds up to n· p/2 wasted space. (p is page size.)

Small is bad:

– Generally more accumulated transfer time be- tween main memory and disk (why?). – More overhead. Average process size: s; Page table entry size: e; Page size: p. This leads to

verhead: o

= ⌈s·e/p⌉

(1)

+p/2 minimum:p ≈ √ 2se Question: What’s (1)?

04 – 32 Memory Management/4.5 Design Issues Paging Systems

SLIDE 34

Segmentation (1/2)

Idea: Instead of allocating a process one huge (log- ically) contiguous piece of memory, offer it separate address spaces to allocate different parts of the pro- gram it is running ⇒ segments.

Parse tree Call stack Space currently being used by the parse tree Free Virtual address space Symbol table Symbol table has bumped into the source text table Address space allocated to the parse tree Source text Constant table

04 – 33 Memory Management/4.6 Segmentation

SLIDE 35

Segmentation (2/2)

Symbol table Source text Constants Parse tree Call stack Segment Segment 1 Segment 2 Segment 3 Segment 4 20K 16K 12K 8K 4K 0K 12K 8K 4K 0K 0K 16K 12K 8K 4K 0K 12K 8K 4K 0K

04 – 34 Memory Management/4.6 Segmentation

SLIDE 36

Segmentation versus Paging

Consideration Paging Segmentation Need the programmer be aware that this technique is being used? How many linear address spaces are there? Can the total address space exceed the size of physical memory? Can procedures and data be distinguished and separately protected? Can tables whose size fluctuates be accommodated easily? Is sharing of procedures between users facilitated? Why was this technique invented? No No No No 1 Yes Yes Yes Yes Yes Yes Many To get a large linear address space without having to buy more physical memory To allow programs and data to be broken up into logically independent address spaces and to aid sharing and protection

04 – 35 Memory Management/4.6 Segmentation

SLIDE 37

Segmentation with Paging Intel Pentium

Problem: If segments are large, we may want to split them up into pages, which, of course, need not be in memory all the time. The Pentium can hold 16K independent segments, which in turn are paged.

A selector points to an entry in descriptor table
A descriptor contains a complete description of

a segment

Privilege level (0-3) Relative address 4 Base 0-15 Limit 0-15 Base 24-31 Base 16-23 Limit 16-19 G D 0 P DPL Type 0: Li is in bytes 1: Li is in pages 0: 16-Bit segment 1: 32-Bit segment 0: Segment is absent from memory 1: Segment is present in memory Segment type and protection S 0: System 1: Application 32 Bits

Note: paging can be explicitly enabled and disabled ⇒ we have support for pure segmentation, or paged segmentation.

04 – 36 Memory Management/4.6 Segmentation

SLIDE 38

Mapping to Physical Addresses

Essence: We map the base address in the descriptor, plus the given offset, to a 32-bit address. This address is subsequently mapped via a two-level page table to a physical address.

Descriptor Base address Limit Other fields 32-Bit linear address + Selector Offset

(a) (b) Bits Linear address 10 10 12 Dir Page Offset Page directory Directory entry points to page table

Page table

entry points to word

Page frame

Word selected Dir Page table Page 1024 Entries Offset

04 – 37 Memory Management/4.6 Segmentation

SLIDE 39

Memory Management: MINIX (1/2)

Memory management in MINIX 3 is done by the Pro- cess Manager (PM)

No swapping or paging: a process is created,

gets its memory, and nothing changes to that until the process terminates.

Simple first-fit memory allocation scheme, allo-

cated memory never shrinks or grows. Reasons: see Tanenbaum. Basically: just simplicity. Note: Most modern PC-UNIX versions support ad- vanced memory management, but often assume at least a 80386, i.e. some hardware support.

04 – 38 Memory Management/4.7 PM in MINIX

SLIDE 40

Memory Management: MINIX (2/2)

Unusual: MINIX has PM outside the kernel:

… … …

Init Kernel User processes Server processes Device drivers Kernel User mode Kernel mode Layer 4 3 2 1 User process Process manager File system Disk driver Clock task System task TTY driver Ethernet driver Info server Network server User process User process

Note: Having PM outside the kernel provides separa- tion of policy (what do you want to do) and mecha- nisms (how do you realize things):

Policies are related to (1) swapping and paging,

(2) allocation and deallocation of memory, etc.

Mechanisms are related to (1) setting memory

maps, (2) adjusting process-related data struc- tures, etc.

04 – 39 Memory Management/4.7 PM in MINIX

SLIDE 41

Memory model

MINIX 3 A C MINIX 3 MINIX 3 A A’s child Upper memory limit A B B B Text Data + bss Data segment grows upward (or downward) when BRK calls are made. Stack Stack segment grows downward Gap Total memory use A::fork() Child(A)::exec(C) Replace mem with program to execute Clone A 04 – 40 Memory Management/4.7 PM in MINIX

SLIDE 42

PM – System Calls

FORK(): called by a process to create another
ne. PM clones the caller to a child that is placed

somewhere in memory.

EXIT(status): terminate the caller. PM cleans

up memory that was allocated to the caller.

WAIT(): called by a parent to suspend until child

is dead. Because PM deals with process termi- nation, it handles WAIT() as well. Also goes for

SIGNAL, KILL, ALARM, and PAUSE.

BRK(size): request more/less memory. PM re-

sponds by checking if stack and data part do not

verlap.

If so, it adjusts maps for the process. Note that no memory is (de)allocated.

EXEC(stack): load program to execute by caller.

PM replaces caller’s currently assigned memory by the memory needed to execute the program.

04 – 41 Memory Management/4.7 PM in MINIX

SLIDE 43

Main Loop Process Manager

18041 PUBLIC int main(){ 18044 int result, s, proc_nr; 18045 struct mproc *rmp; 18046 sigset_t sigset; 18047 18048 pm_init(); /* initialize process manager tables */ 18049 18051 while (TRUE) { 18052 get_work(); /* wait for an PM system call */ ..... 18064 if ((unsigned) call_nr >= NCALLS) { 18065 result = ENOSYS; 18066 } else { 18067 result = (*call_vec[call_nr])(); 18068 } 18069 18070 /* Send the results back to the user to indicate completion. */ 18071 if (result != SUSPEND) setreply(who, result); /* Prepare reply message */ ..... 18075 /* Send out all pending reply messages, including the answer to 18076 * the call just made above. The processes must not be swapped out. 18077 */ 18078 for (proc_nr=0, rmp=mproc; proc_nr < NR_PROCS; proc_nr++, rmp++) { 18084 if ((rmp->mp_flags & (REPLY | ONSWAP | IN_USE | ZOMBIE)) == 18085 (REPLY | IN_USE)) { 18086 if ((s = send(proc_nr, &rmp->mp_reply)) != OK) { 18087 panic(__FILE__,"PM can’t reply to", proc_nr); 18088 } 18089 rmp->mp_flags &= ~REPLY; 18090 } 18091 } 18092 } 18093 return(OK); 18094 } .....

18099 PRIVATE void get_work(){

18101 /* Wait for the next message and extract useful information from it. */ 18102 if (receive(ANY, &m_in) != OK) panic(__FILE__,"PM receive error", NO_NUM); 18103 who = m_in.m_source; /* who sent the message */ 18104 call_nr = m_in.m_type; /* system call number */ 18105 18110 mp = &mproc[who < 0 ? PM_PROC_NR : who]; 18111 }

04 – 42 Memory Management/4.7 PM in MINIX

SLIDE 44

PM – Data Structures

02829 struct mem_map { 02830 vir_clicks mem_vir; /* virtual address / 02831 phys_clicks mem_phys; / physical address / 02832 vir_clicks mem_len; / length */ 02833 };

Address (hex) 210K (0x34800) 208K (0x34000) 207K (0x33c00) 203K (0x32c00) 200K (0x32000) (a) (b) 0xc8 0xc8 0xd0 0x8 Text Data Stack 0x7 Physical Virtual Length 0x2 (c) 0xc8 0xcb 0xd0 0x5 0x3 0x4 0x2 Text Data Stack Physical Virtual Length Stack Data Text

Note: All lengths are measured in clicks, which are 1024 bytes.

04 – 43 Memory Management/4.7 PM in MINIX

SLIDE 45

PM – Shared Text

0x3d400 0x3d000 0x3c000 0x3dc00 Process 2 (c) 0xc8 0xf0 0xf5 0x5 0x3 0x4 0x2 0xcb 0xd0 0x5 0x4 0x2 Text Data Stack Physical Virtual Length 0xc8 0x3 Text Data Stack Physical Virtual Length Gap 0x34000 0x33c00 0x32c00 0x34800 0x32000 (b) Process 1 (a) Stack (proc 1) Data (proc 1) Gap Stack (proc 2) Data (proc 2) Text (shared)

04 – 44 Memory Management/4.7 PM in MINIX

SLIDE 46

Forking a Process (1/2)

18430 PUBLIC int do_fork() 18431 { 18432 /* The process pointed to by ’mp’ has forked. Create a child process. */ 18433 register struct mproc *rmp; /* pointer to parent */ 18434 register struct mproc *rmc; /* pointer to child */ 18435 int child_nr, s; 18436 phys_clicks prog_clicks, child_base; 18437 phys_bytes prog_bytes, parent_abs, child_abs; /* Intel only */ 18438 pid_t new_pid; 18439 18440 /* If tables might fill up during FORK, don’t even start since recovery half 18441 * way through is such a nuisance. 18442 */ 18443 rmp = mp; 18444 if ((procs_in_use == NR_PROCS) || 18445 (procs_in_use >= NR_PROCS-LAST_FEW && rmp->mp_effuid != 0)) 18446 { 18447 printf("PM: warning, process table is full!\n"); 18448 return(EAGAIN); 18449 } 18450 18451 /* Determine how much memory to allocate. Only the data and stack need to 18452 * be copied, because the text segment is either shared or of zero length. 18453 */ 18454 prog_clicks = (phys_clicks) rmp->mp_seg[S].mem_len; 18455 prog_clicks += (rmp->mp_seg[S].mem_vir - rmp->mp_seg[D].mem_vir); 18456 prog_bytes = (phys_bytes) prog_clicks << CLICK_SHIFT; 18457 if ( (child_base = alloc_mem(prog_clicks)) == NO_MEM) return(ENOMEM); 18458 18459 /* Create a copy of the parent’s core image for the child. */ 18460 child_abs = (phys_bytes) child_base << CLICK_SHIFT; 18461 parent_abs = (phys_bytes) rmp->mp_seg[D].mem_phys << CLICK_SHIFT; 18462 s = sys_abscopy(parent_abs, child_abs, prog_bytes); 18463 if (s < 0) panic(__FILE__,"do_fork can’t copy", s); 18464 18465 /* Find a slot in ’mproc’ for the child process. A slot must exist. */ 18466 for (rmc = &mproc[0]; rmc < &mproc[NR_PROCS]; rmc++) 18467 if ( (rmc->mp_flags & IN_USE) == 0) break; 18468

04 – 45 Memory Management/4.7 PM in MINIX

SLIDE 47

Forking a Process (2/2)

18469 /* Set up the child and its memory map; copy its ’mproc’ slot from parent. */ 18470 child_nr = (int)(rmc - mproc); /* slot number of the child */ 18471 procs_in_use++; 18472 *rmc = *rmp; /* copy parent’s process slot to child’s */ 18473 rmc->mp_parent = who; /* record child’s parent */ 18474 /* inherit only these flags */ 18475 rmc->mp_flags &= (IN_USE|SEPARATE|PRIV_PROC|DONT_SWAP); 18476 rmc->mp_child_utime = 0; /* reset administration */ 18477 rmc->mp_child_stime = 0; /* reset administration */ 18478 18479 /* A separate I&D child keeps the parents text segment. The data and stack 18480 * segments must refer to the new copy. 18481 */ 18482 if (!(rmc->mp_flags & SEPARATE)) rmc->mp_seg[T].mem_phys = child_base; 18483 rmc->mp_seg[D].mem_phys = child_base; 18484 rmc->mp_seg[S].mem_phys = rmc->mp_seg[D].mem_phys + 18485 (rmp->mp_seg[S].mem_vir - rmp->mp_seg[D].mem_vir); 18486 rmc->mp_exitstatus = 0; 18487 rmc->mp_sigstatus = 0; 18488 18489 /* Find a free pid for the child and put it in the table. */ 18490 new_pid = get_free_pid(); 18491 rmc->mp_pid = new_pid; /* assign pid to child */ 18492 18493 /* Tell kernel and file system about the (now successful) FORK. */ 18494 sys_fork(who, child_nr); 18495 tell_fs(FORK, who, child_nr, rmc->mp_pid); 18496 18497 /* Report child’s memory map to kernel. */ 18498 sys_newmap(child_nr, rmc->mp_seg); 18499 18500 /* Reply to child to wake it up. */ 18501 setreply(child_nr, 0); /* only parent gets details */ 18502 rmp->mp_reply.procnr = child_nr; /* child’s process number */ 18503 return(new_pid); /* child’s pid */ 18504 }

04 – 46 Memory Management/4.7 PM in MINIX

SLIDE 48

PM – Exiting Processes

Problem: A process can wait() for its children to terminate, and children can exit() or get kill()ed to do so. But what happens when parent and child are

ut of sync?
Parent is prepared to wait after child has already

terminated ⇒ child becomes a zombie: it is not removed, but its memory is freed. It cannot do or respond to anything.

Parent terminates before child does ⇒ because

the parent has no registered wait, the child would eventually become a zombie and remain so. So- lution: orphaned children are assigned to the init process.

04 – 47 Memory Management/4.7 PM in MINIX

SLIDE 49

Exiting a Process (1/2)

18521 PUBLIC void pm_exit(rmp, exit_status) 18522 register struct mproc *rmp; /* pointer to the process to be terminated */ 18523 int exit_status; /* the process’ exit status (for parent) */ 18524 { 18525 /* A process is done. Release most of the process’ possessions. If its 18526 * parent is waiting, release the rest, else keep the process slot and 18527 * become a zombie. 18528 */ 18529 register int proc_nr; 18530 int parent_waiting, right_child; 18531 pid_t pidarg, procgrp; 18532 struct mproc *p_mp; 18533 clock_t t[5]; 18534 18535 proc_nr = (int) (rmp - mproc); /* get process slot number */ 18536 18537 /* Remember a session leader’s process group. */ 18538 procgrp = (rmp->mp_pid == mp->mp_procgrp) ? mp->mp_procgrp : 0; 18539 18540 /* If the exited process has a timer pending, kill it. */ 18541 if (rmp->mp_flags & ALARM_ON) set_alarm(proc_nr, (unsigned) 0); 18542 18543 /* Do accounting: fetch usage times and accumulate at parent. */ 18544 sys_times(proc_nr, t); 18545 p_mp = &mproc[rmp->mp_parent]; /* process’ parent */ 18546 p_mp->mp_child_utime += t[0] + rmp->mp_child_utime; /* add user time */ 18547 p_mp->mp_child_stime += t[1] + rmp->mp_child_stime; /* add system time */ 18548 18549 /* Tell the kernel and FS that the process is no longer runnable. */ 18550 tell_fs(EXIT, proc_nr, 0, 0); /* file system can free the proc slot */ 18551 sys_exit(proc_nr); 18552 18553 /* Pending reply messages for the dead process cannot be delivered. */ 18554 rmp->mp_flags &= ~REPLY; 18555 18556 /* Release the memory occupied by the child. */ 18557 if (find_share(rmp, rmp->mp_ino, rmp->mp_dev, rmp->mp_ctime) == NULL) { 18558 /* No other process shares the text segment, so free it. */ 18559 free_mem(rmp->mp_seg[T].mem_phys, rmp->mp_seg[T].mem_len); 18560 }

04 – 48 Memory Management/4.7 PM in MINIX

SLIDE 50

Exiting a Process (2/2)

18561 /* Free the data and stack segments. */ 18562 free_mem(rmp->mp_seg[D].mem_phys, 18563 rmp->mp_seg[S].mem_vir 18564 + rmp->mp_seg[S].mem_len - rmp->mp_seg[D].mem_vir); 18565 18566 /* The process slot can only be freed if the parent has done a WAIT. */ 18567 rmp->mp_exitstatus = (char) exit_status; 18568 18569 pidarg = p_mp->mp_wpid; /* who’s being waited for? */ 18570 parent_waiting = p_mp->mp_flags & WAITING; 18571 right_child = /* child meets one of the 3 tests? */ 18572 (pidarg == -1 || pidarg == rmp->mp_pid || -pidarg == rmp->mp_procgrp); 18573 18574 if (parent_waiting && right_child) { 18575 cleanup(rmp); /* tell parent and release child slot */ 18576 } else { 18577 rmp->mp_flags = IN_USE|ZOMBIE; /* parent not waiting, zombify child */ 18578 sig_proc(p_mp, SIGCHLD); /* send parent a "child died" signal */ 18579 } 18580 18581 /* If the process has children, disinherit them. INIT is the new parent. */ 18582 for (rmp = &mproc[0]; rmp < &mproc[NR_PROCS]; rmp++) { 18583 if (rmp->mp_flags & IN_USE && rmp->mp_parent == proc_nr) { 18584 /* ’rmp’ now points to a child to be disinherited. */ 18585 rmp->mp_parent = INIT_PROC_NR; 18586 parent_waiting = mproc[INIT_PROC_NR].mp_flags & WAITING; 18587 if (parent_waiting && (rmp->mp_flags & ZOMBIE)) cleanup(rmp); 18588 } 18589 } 18590 18591 /* Send a hangup to the process’ process group if it was a session leader. */ 18592 if (procgrp != 0) check_sig(-procgrp, SIGHUP); 18593 }

04 – 49 Memory Management/4.7 PM in MINIX

SLIDE 51

Waiting for a Process

18598 PUBLIC int do_waitpid() { ..... 18609 register struct mproc *rp; 18610 int pidarg, options, children; ..... 18617 /* Is there a child waiting to be collected? At this point, pidarg != 0: 18618 * pidarg > 0 means pidarg is pid of a specific process to wait for 18619 * pidarg == -1 means wait for any child 18620 * pidarg < -1 means wait for any child whose process group = -pidarg 18621 */ 18622 children = 0; 18623 for (rp = &mproc[0]; rp < &mproc[NR_PROCS]; rp++) { 18624 if ( (rp->mp_flags & IN_USE) && rp->mp_parent == who) { 18625 /* The value of pidarg determines which children qualify. */ 18626 if (pidarg > 0 && pidarg != rp->mp_pid) continue; 18627 if (pidarg < -1 && -pidarg != rp->mp_procgrp) continue; 18628 18629 children++; /* this child is acceptable */ 18630 if (rp->mp_flags & ZOMBIE) { 18631 /* This child meets the pid test and has exited. */ 18632 cleanup(rp); /* this child has already exited */ 18633 return(SUSPEND); 18634 } 18635 if ((rp->mp_flags & STOPPED) && rp->mp_sigstatus) { 18636 /* This child meets the pid test and is being traced.*/ 18637 mp->mp_reply.reply_res2 = 0177|(rp->mp_sigstatus << 8); 18638 rp->mp_sigstatus = 0; 18639 return(rp->mp_pid); 18640 } 18641 } 18642 } 18643 18644 /* No qualifying child has exited. Wait for one, unless none exists. */ 18645 if (children > 0) { 18646 /* At least 1 child meets the pid test exists, but has not exited. */ 18647 if (options & WNOHANG) return(0); /* parent does not want to wait */ 18648 mp->mp_flags |= WAITING; /* parent wants to wait */ 18649 mp->mp_wpid = (pid_t) pidarg; /* save pid for later */ 18650 return(SUSPEND); /* do not reply, let it wait */ 18651 } else { 18652 /* No child even meets the pid test. Return error immediately. */ 18653 return(ECHILD); /* no - parent has no children */ 18654 } 18655 }

04 – 50 Memory Management/4.7 PM in MINIX

SLIDE 52

PM – Executing Programs

ls -1 f.c g.c Shell command execve("/bin/ls", argv, envp); System call g.c f.c

ls HOME=/usr/ast Internal data structures

Idea: Transform the parameters of the execve() sys- tem call (that refer to process data structures), to a set

f parameters that have been pushed onto the kernel

stack for the procedure exec(). The nitty gritty stuff is really not that interesting. That everything’s on the stack is. See Tanenbaum for de- tails.

04 – 51 Memory Management/4.7 PM in MINIX

SLIDE 53

Signal Handling (1/2)

A process tells the OS that it wants to ignore/catch

certain signals.

Catching a signal is done by installing a signal

handler: the process passes the address of a function to the OS.

Upon ending a signal handler, a special system

call is made to clean up things (transparent to a programmer). The problem is that we need to switch the running process to its signal handler, and afterwards clean up things by automatically invoking a sigreturn func- tion.

04 – 52 Memory Management/4.7 PM in MINIX

SLIDE 54

Signal Handling (2/2)

(b) (c) (d) (a) Stackframe (CPU regs) (original) Ret addr Local vars (process) Ret addr Ret addr 2 Ret addr 1 Sigframe structure Local vars (process) Local vars (handler) Ret addr Local vars (process) Ret addr Stack Process table Local vars (process) Before Stackframe (CPU regs) (modified, ip = handler) Stackframe (CPU regs) (modified, ip = handler) Stackframe (CPU regs) (original) Ret addr 2 Local vars (Sigreturn) Stackframe (CPU regs) (original) Handler executing Sigreturn executing Stackframe (CPU regs) (original) Back to normal

Note: ret addr 1 is the return address of sigreturn.

04 – 53 Memory Management/4.7 PM in MINIX