CSE 120 For multiprogramming, must have multiple processes Each - - PowerPoint PPT Presentation

cse 120
SMART_READER_LITE
LIVE PREVIEW

CSE 120 For multiprogramming, must have multiple processes Each - - PowerPoint PPT Presentation

Memory CSE 120 For multiprogramming, must have multiple processes Each process must be in memory in order to execute Possibilities One process in memory. Swapping: Swap it out to disk, swap in a new one 0 OS July 18, 2006 a process 1


slide-1
SLIDE 1

CSE 120

July 18, 2006 Day 5 Memory Instructor: Neil Rhodes Memory

For multiprogramming, must have multiple processes Each process must be in memory in order to execute Possibilities

One process in memory. Swapping: Swap it out to disk, swap in a new one

2

physical memory n

process 1

OS a

For multiprogramming, must have multiple processes Each process must be in memory in order to execute Possibilities

Multiple processes in memory, each in their own partition

– Fixed-size equal partitions

process 2

Memory

3

Internal Fragmentation-Fragmented memory within allocated memory blocks physical memory a+2n

process 3 OS process 1

a a+n a+3n

For multiprogramming, must have multiple processes Each process must be in memory in order to execute Possibilities

Multiple processes in memory, each in their own partition

– Fixed-size non-equal partitions

Memory

4

External Fragmentation-Fragmented memory between allocated memory blocks

process 2

physical memory

process 3

OS process 1

a b c d

slide-2
SLIDE 2

Placement Algorithms

Need to allocate memory block of size n First-fit

Find first block n

Best-fit

Find smallest open memory block (n)

Worst-fit

Find largest open memory block (n)

Next-fit

Starting with last open block considered, find next block n

Buddy system

Round n up to a power of 2. Free memory is kept in powers of 2. If no block

  • f size n, split a larger chunk in half until size n is found.

When a block is freed, if its buddy (of the same size) is free, merge them

together (recursively)

5

Address Binding

When to map instructions and data references to actual memory locations Link time

Absolute addressing. Works if well-known address at which user programs

are loaded

Load time

Linker-loader relocates code as it loads it into memory

– Change all data references to take into account where loaded

Position-independent code.

– On Mac OS 9, subroutine calls are PC-relative; data references are relative to A5

  • register. Linker sets A5 register at load time

Execution time

Code uses logical addresses which are converted to physical addresses Hardware: relocation register

6

n process 1 a+3n Memory a+2n a+n a a relocation register:

+

Process Protection

How to protect the memory of a process (or the OS) from other running processes?

Hardware solution: base (aka relocation) and limit registers

– Disadvantage: can’t easily share memory between processes

Software:

– Tagged data (e.g., Smalltalk, Lisp). No raw pointers – Virtual machine

7

n process 1 a+3n Memory a+2n a+n a a relocation register:

+

n limit register:

<

yes no exception

Within a Processes Address Space

8

process text (code) data heap stack n

slide-3
SLIDE 3

Fragmentation

External Fragmentation

Space wasted between allocated memory blocks

Internal Fragmentation

Space wasted within allocated memory blocks

Solutions to external fragmentation

Compaction

– Move blocks around dynamically to make free space contiguous – Only possible if relocation is dynamic at execution time

Non-contiguous

– Don’t require all of the memory of a process to be contiguous

  • Segmentation
  • Paging

Solutions to internal fragmentation

Don’t use fixed-size blocks

9

Segmentation

Provides multiple address spaces

Handy for separate data/stack/code Good for sharing code/data between processes Easy way to specify protection

– No execute on stack!

Address is (segment_number offset within segment) Need segment base/segment limit registers (array of registers) Programmer (or compiler) must specify different segments

10

process with single segment text (code) data heap stack n text (code) p p q data q-p Segment 0 Segment 1 heap Segment 2 stack Segment 3 process with 4 segments

Segmentation

Pros

Easy to share segments between processes Segment-specific protection Programmer/Compiler/Linker must be aware of different segments

Cons

Still must worry about external fragmentation: each segment must have

contiguous physical memory

11

Segmentation Example

8-bit address:

2 bits segment 6 bits offset Example code

– 0x04: load 0x19, r0 – 0x06: push r0, sp – 0x08: add sp, #2

12

text (code) 40 data 12 Segment 0 Segment 1 heap Segment 2 stack Segment 3 32 35

P C = x 4 , S P = x A 5

slide-4
SLIDE 4

Swapping

Swapping moves the memory of a segment (or entire process) out to disk

Process can’t run without its memory On context switch in, memory is loaded back from memory

Slow since can be lots of data

13

Paging

Map contiguous virtual address space to non-contiguous physical address space

Idea

– Break virtual address space into fixed-size pages (aka virtual pages) – Map those fixed-size pages into same-sized physical frames (aka physical pages)

14

a process k b c d 2k 3k 4k memory k b a 2k 3k 4k d 5k 6k 7k c 8k

Implementing Paging

Page table

Index into table is page number. Each entry is a page table entry:

– Frame number – Valid/invalid bit – Permissions – …

15

process with single segment text (code) data heap stack 3 rx 1 rx 5 rw 8 rw 10 rw invalid invalid 7 rw … invalid memory text (2 of 3) text (1 of 3) text (3 of 3) stack heap (1 of 2) heap (2 of 2)

Implementing Paging

Usually done in hardware (Memory Management Unit—MMU)

Page-Table Base Register and Page-Table Limit Register

– Must be saved/restored on a context-switch

Free list of frames Page-table per process (or per segment)

  • convertToPhysicalAddress(physicalAddress) {

pageNumber = upper bits of physicalAddress if pageNumber ouf of range, generate exception (aka fault) if pageTable[pageNumber].invalid generate page fault

  • ffset = lower bits of physicalAddress

upper bits of virtualAddress=pageTable[pageNumber].frameNumber lower bits of virtualAddres = offset

Advantages

No external fragmentation

– Unallocated memory can be allocated to any process

Transparent to programmer/linker/compiler Can put individual frames out to disk (virtual memory)

Disadvantages

Translating from virtual to physical address requires an additional memory

access

Unused pages in a process still require page table entries Internal fragmentation

– On average, 1/2 frame size per segment

16

slide-5
SLIDE 5

Relationship between physical and virtual addresses

Virtual address same size as physical address Virtual address smaller than physical address Virtual address bigger than physical address

17

page # offset virtual address frame # offset physical address page # offset virtual address frame # offset physical address

page # offset virtual address frame # offset physical address

Dealing with holes in virtual address space

With large address space (and small pages), page table can be very large

32-bit virtual addresses, 4KB page: 220 page table entries per process

Multilevel page table

Virtual address broken up into multiple page numbers: PT1 and PT2. PT1 used as index into top-level page table to find second-level page table PT2 used as index into second-level page table to find page table entry If parts of address space are unused, top-level page table can show second-

level page table not present

18

a process 1 b unused unused 2 3 4 memory 1 b a 2 3 4 d 5 6 7 unused d 5 3 1

invalid

4 1 2 3

Dealing with holes in virtual address space

Combination of segmentation/paging

Top n bits are segment number

– Except some machines, current segment stored in separate register, not in address itself

Next m bits are page number Remainder are offset

19

Segment 0 Segment 1 memory 1 b a 2 3 4 d 5 6 7 1 a b d

3

1 4 segment size: 500 segment size: 200

Speed

Imagine memory access is 100ns How long to access memory using paging?

100ns to access page table entry 100ns to access physical memory address

Solution

Use a cache

– definition; copy on a faster medium of data that resides elsewhere – Examples:

  • Caching contents of files in memory
  • Caching contents of memory in off-chip (Level 2) cache
  • Caching contents of off-chip cache in on-chip (Level 1) cache
  • Caching contents of page-table entries

– Kinds of cache:

  • write-back (changes to data are made in cache. written to permanent storage later).
  • write-through (changes to data are made in cache and permanent storage)

20

Using virtual addresses slows down by factor of 2

slide-6
SLIDE 6

Translation Lookaside Buffer (TLB) Implemented in Hardware

Cache to map virtual page numbers to page frame

Associative memory: HW looks up in all cache entries simultaneously

– Usually not big: 64-128 entries

TLB entry:

– page number – Valid – Modified – Protection – Page frame

If not present, do ordinary lookup, then evict entry from TLB and add new

  • ne

– Evict which entry?

Serial/Parallel lookup

– Serial: First look in TLB. If not found, then look in page table – Parallel. Look in TLB and in page table in parallel. If not found in TLB, then page table

lookup already in progress.

21

Software TLB Management

MMU doesn’t handle page tables; software does On a TLB miss, generate a TLB fault and let OS deal with it

Search a larger memory cache.

– Page containing cache must be in TLB for speed

If not in cache, search page table Once page frame, etc. found, update TLB

Why not use hardware?

Logic to search page table takes space on the die Spend die size alternatively:

– Increase Memory cache – Reduce cost/power consumption

22

TLB Summary

Cost Example

Direct memory access: 100ns Without TLB: 200ns (lookup in Page Table first) With TLB

– Assume cost of TLB lookup is 10ns – Assume TLB hit rate is 90% – Serial lookup: Average cost = .9*110ns + .1*200ns = 119ns – Parallel lookup: Average cost = .9*110ns + .1*(200ns-10ns) = 118ns

Caches are very sensitive to:

Hit rate Cost of cache miss

Note that TLB must be flushed on context switch

Unless TLB entries include process ID

23

Inverted Page Tables

Traditional page tables: 1 entry/virtual page Inverted page tables: 1 entry/physical frame of memory Why? Size

64-bit virtual addresses, 4KB page 256MB of RAM per process. Inverted

page table needs 65536 entries

Page Table Entry:

Process ID Virtual page number Additional PTE info

Slow to search through table with 65536 entries

Solution: Hash table. Key is virtual page number. Entry contains virtual page,

process ID and page frame

Advantage:

Page table memory is proportional to physical memory

– Not logical address space – Not number of processes

Disadvantage

Hard to share memory between processes

24

slide-7
SLIDE 7

Inverted Page Tables

Hash table

Space: proportional to number of allocated memory frames

– 1 entry in hash table for each allocated page

25

p virtual address hash(p)

  • ffset

f f physical address

  • ffset

pid:

Segmentation vs. Paging

26

Segmentation Paging Need the programmer be aware the technique is being used? How many linear address spaces are there? Can the total address space exceed the size of phys. mem? Can procedures and data be distinguished and separately protected? Can tables whose size fluctuates be accommodated easily? Is sharing of procedures between users facilitated?

Page Fault Handling

MMU generates Page Fault (protection violation or page not present). Page fault handler must:

Save registers Figure out virtual address that caused fault

– Often in hardware register

If writing to page after currently-allocated stack

– Allocate free page to add to stack – Update page-table –

Restart instruction for faulting process

  • Must undo any partial effects

Else, signal or kill process

27