Paging Basic method Hardware support TLB Memory Protection - - PowerPoint PPT Presentation

paging
SMART_READER_LITE
LIVE PREVIEW

Paging Basic method Hardware support TLB Memory Protection - - PowerPoint PPT Presentation

BS1 WS19/20 topic-based slides Paging Basic method Hardware support TLB Memory Protection Hierarchical Page Tables Hashed Page Tables Inverted Page Tables Shared Pages Virtual Address space under different


slide-1
SLIDE 1

BS1 WS19/20 – topic-based slides

Paging

  • Basic method
  • Hardware support
  • TLB
  • Memory Protection
  • Hierarchical Page Tables
slide-2
SLIDE 2
  • Hashed Page Tables
  • Inverted Page Tables
  • Shared Pages
  • Virtual Address space under different architectures
  • Paging Dynamics, Page faults
  • Working Sets
  • Page Replacement Strategies
  • Copy-On-Write Pages

2

slide-3
SLIDE 3

Operating Systems 15

Paging

  • Dynamic storage allocation algorithms for varying-sized chunks of memory may

lead to fragmentation

  • Solutions:

a) Compaction – dynamic relocation of processes b) Noncontiguous allocation of process memory in equally sized pages (this avoids the memory fjtting problem)

  • Paging breaks physical memory into fjxed-sized blocks – page frames
  • Logical memory is organized into pages of the same size
  • paging is the process of dynamically mapping memory pages to page frames
slide-4
SLIDE 4

16

Paging: Basic Method

  • When a process is executed, its pages are loaded into any available

frames from backing store (disk)

  • Hardware support for paging consists of a page table
  • Logical addresses consist of page number and ofgset

CPU f d p d Physical memory Logical address Physical address p Page table Page number

  • ffset

Page frames are typically 2-4 kb

slide-5
SLIDE 5

Operating Systems 17

Paging Example

Page 0 Page 1 Page 2 Page 3 logical memory Page 0 Page 2 physical memory Page 1 Page 3 1 2 3 4 5 6 7 4 1 6 3 page table 1 2 3

slide-6
SLIDE 6

Operating Systems 18

Free Frames

Before allocation After allocation

Page 2

physical memory

Page 0 Page 3

frame number

10 11 12 13 14 15 16 17 Page 1 7 8 9 10 11 12 13 14 15 16 17 7 8 9 Free frame list 7, 8, 10, 11,13, 16 Free frame list 7, 10 11 8 16 13 New process page table 1 2 3 Process creation

physical memory frame number

slide-7
SLIDE 7

Operating Systems 19

Paging: Hardware Support

  • Every memory access requires access to the page table
  • Page tables exist on a per-user process basis
  • Page table should be implemented in hardware
  • Small page tables can be just a set of registers
  • Problem: size of physical memory, # of processes
  • Page tables should be kept in memory
  • Only base address of page table is kept in a special register (cr3 in x86)
  • Problem: memory accesses is slow
  • Solution: Translation look-aside bufgers (TLBs)
  • Associative registers store recently used page table entries
  • TLBs are fast, expensive, small: 8..2048 entries
  • TLB must be fmushed on process context switches
slide-8
SLIDE 8

Operating Systems 20

Associative Memory

  • Associative memory – parallel search

Address translation (A´, A´´)

  • If A´ is in associative register, get frame # out.
  • Otherwise get frame # from page table in memory

Page # Frame #

slide-9
SLIDE 9

Operating Systems 21

Paging Hardware With TLB

CPU f d p d Physical memory Logical address Physical address p Page table Page number

  • ffset

Page # Frame #

TLB

TLB miss TLB hit

slide-10
SLIDE 10

Operating Systems 22

Efgective Access Time with TLB

  • Associative Lookup in TLB = ε time unit;
  • Assume memory cycle time is 1μs; with ε << 1μs
  • Hit ratio – percentage of times that a page number is found in the

associative registers;

  • ratio related to number of associative registers and process workload.
  • Let us assume a hit ratio = α
  • Efgective Access Time (EAT)

EAT = (1μs + ε) α + (2μs + ε) (1 – α)

slide-11
SLIDE 11

Operating Systems 23

Memory Protection

  • Memory protection implemented by associating

control bits with each page frame reference in the page table

  • Isolation of processes in main memory
  • Valid-invalid bit attached to each entry in the page table:
  • “valid” indicates that the associated page is in the process’

logical address space, and is thus a legal page

  • “invalid” indicates that the page is not in the process’ logical

address space

  • access to invalid page produces a page fault
slide-12
SLIDE 12

Operating Systems 24

  • Invalid pages may simply be paged out

Valid (v) or Invalid (i) Bit in a Page Table

Page 0 Page 1 Page 2 Page 3 logical memory Page 0 Page 2

physical memory

Page 1 Page 3

frame number

1 2 3 4 5 6 7 4 1 6 3 page table 1 2 3 4 5 v v v v i i

slide-13
SLIDE 13

Operating Systems 25

Hierarchical Page Tables

  • Break up the logical address space into multiple page tables
  • “fmat” Page tables are too large
  • nested levels of page tables can be allocated as necessary
  • allows for large pages
  • A simple technique is a two-level page table
  • Used with 32-bit CPUs
slide-14
SLIDE 14

Operating Systems 26

  • A logical address (on 32-bit machine with 4K page size) is divided into:
  • a page number consisting of 20 bits.
  • a page ofgset consisting of 12 bits.
  • Since the page table is paged, the page number is further divided into:
  • a 10-bit page number
  • a 10-bit page ofgset
  • Thus, a logical address is as follows:

where pi is an index into the outer page table, and p2 is the displacement within the page of the outer page table

page number page offset pi p2 d 10 10 12

Two-Level Paging Example

slide-15
SLIDE 15

Operating Systems 27

Two-Level Page-Table Scheme

… … … …

  • uter page table

(page directory) page tables memory

slide-16
SLIDE 16

Operating Systems 28

Address-Translation Scheme

  • Address-translation

scheme for a two- level 32-bit paging architecture

page number page offset p1 p2 d 10 10 12 page directory

page table Main memory

p1 p2

slide-17
SLIDE 17

Operating Systems 29

Hashed Page Tables

  • Common in address spaces > 32 bits
  • IA64 supports hashed page tables
  • The virtual page number is hashed into a page table. This page

table contains a chain of elements hashing to the same location

  • Virtual page numbers are compared in this chain searching for a
  • match. If a match is found, the corresponding physical frame is

extracted

slide-18
SLIDE 18

Operating Systems 30

Hashed Page Table

CPU f d p d Physical memory Logical address Physical address Page table Page number

  • ffset

p f q r hash function

slide-19
SLIDE 19

Operating Systems 31

Inverted Page Table

  • One entry for each real page of memory
  • Entry consists of the virtual address of the page stored in that

real memory location, with information about the process that

  • wns that page
  • Decreases memory needed to store each page table, but

increases time needed to search the table when a page reference

  • ccurs
  • Use hash table to limit the search to one — or at most a few —

page-table entries

slide-20
SLIDE 20

Operating Systems 32

Inverted Page Table Architecture

CPU f d p d Physical memory Logical address Physical address Page table Page number

  • ffset

pid p search pid Process ID

f

slide-21
SLIDE 21

Operating Systems 33

Shared Pages

  • Shared code
  • One copy of read-only (reentrant) code shared among

processes (i.e., text editors, compilers, window systems)

  • Shared code must appear in same location in the logical

address space of all processes

  • Private code and data
  • Each process keeps a separate copy of the code and data
  • The pages for the private code and data can appear anywhere

in the logical address space

slide-22
SLIDE 22

Operating Systems 34

Shared Pages Example

Process 1 virtual memory Process 2 virtual memory

cpp cc1 data1 data2 cc2 memory cpp cc1 cc2 data1 cpp cc1 cc2 data2 1 4 11 7 1 4 11 8 frame number 1 2 3 4 5 6 7 8 9 10 11

Process 1 page table Process 2 page table

slide-23
SLIDE 23

35

Segmentation with Paging

The innovative MULTICS operating system introduced:

  • Logical addresses: 18-bit segment no, 16-bit ofgset
  • (relatively) small number of 64k segments
  • To eliminate fragmentation, segments are paged
  • A separate page table exists for each segment

segment length page-table base s d >= Trap + yes no segment table base register d p d‘ + f f d‘ physical memory physical address page table for segment s logical address segment table

slide-24
SLIDE 24

Operating Systems 36

Intel 30386 Address Translation

10 10 12

22 31 21 11 Intel Linear Address 12

4Mb PDE 4Kb PDE

Page directory 1024x4byte entries (one per process)

cr 3

Physical address

PTE

Page table 1024 entries Physical Address

  • perand

4 Kb page

  • perand

4 Mb page 22 bit

  • ffset

4KiB frame 4MiB frame Physical Memory limit base

s +

descriptor table

selector

  • ffset

Intel logical Address

The Intel 386 uses segmentation with paging for memory management with a two-level paging scheme.

slide-25
SLIDE 25

Operating Systems 2

  • Mapping via page table entries
  • Indirect relationship between virtual pages and physical memory

Address Translation

Virtual pages Physical memory Page table entries

10 10 12

22 31 21 11 12 Page directory index Page table index Byte index

x86:

user system user system

slide-26
SLIDE 26

Operating Systems 3

32-bit x86 Virtual Address Space

  • 2 GB per-process
  • Address space of one process is not

directly reachable from other processes

  • 2 GB system-wide
  • The operating system is loaded here,

and appears in every process’s address space

  • The operating system is not a process

(though there are processes that do things for the OS, more or less in “background”)

  • 2 GB or addressable Memory per process is a

serious limitation (think compilers, browsers)

Code: EXE/DLLs Data: EXE/DLL static storage, per- thread user mode stacks, process heaps, etc. Code: EXE/DLLs Data: EXE/DLL static storage, per- thread user mode stacks, process heaps, etc. 00000000 7FFFFFFF Code: NTOSKRNL, HAL, drivers Data: kernel stacks, File system cache Non-paged pool, Paged pool Code: NTOSKRNL, HAL, drivers Data: kernel stacks, File system cache Non-paged pool, Paged pool FFFFFFFF 80000000 Process page tables, hyperspace C0000000 Unique per process, accessible in user or kernel mode System wide, accessible

  • nly in kernel

mode Per process, accessible only in kernel mode

slide-27
SLIDE 27

Operating Systems 4

Virtual Address Space Allocation

  • Virtual address space is sparse
  • Address spaces contain reserved, committed, and unused regions
  • Not every virtual address is mapped onto physical memory
  • Unit of protection and usage is one page
  • On x86, default page size is 4 KB (x86 supports 4KB or 4MB)
  • On x64, default page size is 4 KB (large pages are 4 MB)
  • Reserved regions are referenced in page tables
  • Committed regions are backed by physical memory (RAM or Storage)
slide-28
SLIDE 28

Operating Systems 5

3GB Process Space Option

  • Provides 3 GB per-process address space
  • Commonly used by database servers

(for fjle mapping)

  • .EXE must have “large address space

aware” fmag in image header, or they’re limited to 2 GB (specify at link time or with imagecfg.exe from ResKit)

  • Chief “loser” in system space is fjle

system cache

  • System still can only utilize at most 4

GB of physical RAM

Unique per process (= per appl.), user mode

.EXE code Globals Per-thread user mode stacks .DLL code Process heaps Exec, kernel, HAL, drivers, etc. 00000000 BFFFFFFF FFFFFFFF C0000000 Unique per process, accessible in user or kernel mode System wide, accessible

  • nly in kernel

mode Per process, accessible

  • nly in

kernel mode Process page tables, hyperspace

slide-29
SLIDE 29

Operating Systems 6

Physical Memory Usage in PAE Mode

  • Virtual address space is still 4 GB, so how can you “use” > 4 GB of memory?

1. Although each process can only address 2 GB, many may be in memory at the same time (e.g. 5 * 2 GB processes = 10 GB) 2. Files in system cache remain in physical memory

memory manager keeps unmapped data in physical memory

  • System

Working Set Assigned to Virtual Cache

Standby List 960 MB Other ~60 GB 64 GB Physical Memory

slide-30
SLIDE 30

Operating Systems 7

Address Windowing Extensions

  • AWE functions allow Windows processes

to allocate large amounts of physical memory and then map “windows” into that memory

  • Applications: database servers can cache

large databases

  • Up to programmer to control
  • Like memory banks in

embedded controllers

  • No longer necessary on 64Bit systems

AWE memory Physical memory Process virtual memory AWE memory AWE memory

slide-31
SLIDE 31

Operating Systems 8

64-bit Virtual Address Space (ia64)

  • 64 bits = 2^64 = 17 billion GB

(16 exabytes) total

  • (Diagram not to scale)
  • 7152 GB default per-process
  • Pages are 8 Kbytes
  • All pointers are now 64 bits wide

User mode space per process 00000000 00000000 E0000000 00000000 FFFFFF00 00000000 FFFFFFFF FFFFFFFF System space page tables 6FC 00000000 Kernel mode per process 1FFFFF00 00000000 Process page tables 20000000 00000000 Session space Session space page tables System space 3FFFFF00 00000000 E0000600 00000000

slide-32
SLIDE 32

Operating Systems 9

Process vs. System-Space Addresses

  • “Upper half” of page directory for every

process contains same entries (with a few exceptions), which point to system- wide page tables

  • exceptions are for example the page

tables that map the process page tables

Page Directories (one per process) Sets of per-process page tables System-wide page tables

slide-33
SLIDE 33

Operating Systems 10

Exercise – Size of Page Tables

  • Given a Computer System with 40 Bit Words and 40 Bit

Addresses, and a single level Page Table Address Translation, and a page size of 8 KiB with 5 status bits per page table entry:

  • What is the size of the virtual address space?
  • How many entries does a single page table have?
  • What is the size of a page table entry in Bits?
  • How much memory does a single page table encompass?

– individual entries take a multiple of machine word size

slide-34
SLIDE 34

Operating Systems 11

Page directories & Page tables

  • Each process has a single page directory

(phys. addr. in KPROCESS block, at 0xC0300000, in cr3 (x86))

  • cr3 is re-loaded on inter-process context switches
  • Page directory is composed of page directory entries (PDEs) which

describe state/location of page tables for this process

– Page tables are created on demand

  • x86: 1024 page tables describe 4GB
  • Each process has a private set of page tables
  • System has one set of page tables
  • System PTEs are a fjnite resource: computed at boot time
slide-35
SLIDE 35

Operating Systems 12

Page Table Entries

  • Page tables are array of Page Table Entries (PTEs)
  • Valid PTEs have two fjelds:
  • Page Frame Number (PFN)
  • Flags describing state and protection of the page

Res (writable on MP Systems) Res Res Global Res (large page if PDE) Dirty Accessed Cache disabled Write through Owner Write (writable on MP Systems) valid

Reserved bits are used only when PTE is not valid

31 12 Page frame number V U P Cw Gi L D A Cd Wt O W

slide-36
SLIDE 36

Operating Systems 13

PTE Status and Protection Bits (x86)

Name of Bit Meaning on x86 Accessed Page has been read Cache disabled Disables caching for that page Dirty Page has been written to Global Translation applies to all processes (a translation buffer flush won‘t affect this PTE) Large page Indicates that PDE maps a 4MB page (used to map kernel) Owner Indicates whether user-mode code can access the page of whether the page is limited to kernel mode access Valid Indicates whether translation maps to page in phys. Mem. Write through Disables caching of writes; immediate flush to disk Write Uniproc: Indicates whether page is read/write or read-only; Multiproc: ind. whether page is writeable/write bit in res. bit

slide-37
SLIDE 37

Operating Systems 14

Invalid PTEs and their structure

  • Page fjle: desired page resides in paging fjle
  • in-page operation is initiated
  • Demand Zero: pager looks at zero page list;

if list is empty, pager takes list from standby list and zeros it;

  • PTE format as shown below, but page fjle number

and ofgset are zero

Page file offset Protection Page File No Transition Prototype Valid 31 12 11 10 9 5 4 1

slide-38
SLIDE 38

Operating Systems 15

Invalid PTEs and their structure (contd.)

  • Transition: the desired page is in memory on either the standby, modifjed, or modifjed-no-

write list

  • Page is removed from the list and added to working set
  • Unknown: the PTE is zero, or the page table does not yet exist

examine virtual address space descriptors (VADs) to see whether this virtual address has been reserved

  • Build page tables to represent newly committed space

Page Frame Number Protection 1 Transition Prototype Protection Cache disable Write through Owner Write Valid 31 12 11 10 9 5 4 1 1 2 3

slide-39
SLIDE 39

Operating Systems 16

Managing Physical Memory

  • System keeps unassigned physical pages on one of several lists
  • Free page list
  • Modifjed page list
  • Standby page list
  • Zero page list
  • Bad page list - pages that failed memory test at system startup
  • Lists are implemented by entries in the “PFN database”
  • Maintained as FIFO lists or queues
slide-40
SLIDE 40

Operating Systems 17

Paging Dynamics

Standby Page List Zero Page List Free Page List Process Working Sets

page read from disk or kernel allocations demand zero page faults working set replacement

Modified Page List

modified page writer zero page thread “soft” page faults

Bad Page List

Private pages at process exit

slide-41
SLIDE 41

Operating Systems 18

Paging Dynamics

  • New pages are allocated to working sets from the top of the free or zero

page list

  • Pages released from the working set due to working set replacement go to

the bottom of:

  • The modifjed page list (if they were modifjed while in the working set)
  • The standby page list (if not modifjed)

Decision made based on “D” (dirty = modifjed) bit in page table entry

  • Association between the process and the physical page is still

maintained while the page is on either of these lists

slide-42
SLIDE 42

Operating Systems 19

Standby and Modifjed Page Lists

  • Modifjed pages go to modifjed (dirty) list
  • Avoids writing pages back to disk too soon
  • Unmodifjed pages go to standby (clean) list
  • They form a system-wide cache of “pages likely to be needed

again”

  • Pages can be faulted back into a process from the standby

and modifjed page list

  • These are counted as page faults, but not page reads
slide-43
SLIDE 43

Operating Systems 20

Modifjed Page Writer

  • When modifjed list reaches certain size, modifjed page writer system

thread is awoken to write pages out

  • See MmModifjedPageMaximum
  • Also triggered when memory is overcomitted (too few free pages)
  • Does not fmush entire modifjed page list
  • Two system threads
  • One for mapped fjles, one for the paging fjle
  • Pages move from the modifjed list to the standby list
  • E.g. they can still be soft faulted into a working set
slide-44
SLIDE 44

Operating Systems 21

Free and Zero Page Lists

  • Free Page List
  • Used for page reads
  • Private modifjed pages go here on process exit
  • Pages contain junk in them (e.g. not zeroed)
  • On most busy systems, this is empty
  • Zero Page List
  • Used to satisfy demand zero page faults

– References to private pages that have not been created yet

  • When free page list has 8 or more pages, a priority zero thread is

awoken to zero them

  • On most busy systems, this is empty too
slide-45
SLIDE 45

Operating Systems 22

Paging Dynamics

Standby Page List Zero Page List Free Page List Process Working Sets

page read from disk or kernel allocations demand zero page faults working set replacement

Modified Page List

modified page writer zero page thread “soft” page faults

Bad Page List

Private pages at process exit

slide-46
SLIDE 46

Operating Systems 23

Page Frame Number-Database

  • One entry (24 bytes) for each physical page
  • Describes state of each page in physical memory
  • Entries for active/valid and transition pages contain:
  • Original PTE value (to restore when paged out)
  • Original PTE virtual address and container PFN
  • Working set index hint (for the fjrst process...)
  • Entries for other pages are linked in:
  • Free, standby, modifjed, zeroed, bad lists (parity error will kill kernel)
  • Share count (active/valid pages):
  • Number of PTEs which refer to that page; 1->0: candidate for free list
  • Reference count:
  • Locking for I/O: INC when share count 1->0; DEC when unlocked
  • Share count = 0 & reference count = 1 is possible
  • Reference count 1->0: page is inserted in free, standby or modifjed lists
slide-47
SLIDE 47

Operating Systems 24

PFN – states of pages in physical memory

Status Description Active/valid Page is part of working set (sys/proc), valid PTE points to it Transition Page not owned by a working set, not on any paging list I/O is in progress on this page Standby Page belonged to a working set but was removed; not modified Modified Removed from working set, modified, not yet written to disk Modified no write Modified page, will not be touched by modified page write, used by NTFS for pages containing log entries (explicit flushing) Free Page is free but has dirty data in it – cannot be given to user process – C2 security requirement Zeroed Page is free and has been initialized by zero page thread Bad Page has generated parity or other hardware errors

slide-48
SLIDE 48

Operating Systems 25

Page tables and page frame database

valid Invalid: disk address Invalid: transition valid Invalid: disk address Valid valid Invalid: transition Invalid: disk address Prototype PTE Process 1 page table Process 2 page table Process 3 page table Active Standby Active Active Modified Zeroed Free Standby Modified Bad Modified no write

slide-49
SLIDE 49

Operating Systems 26

Page Fault Handling

  • Reference to invalid page is called a page fault
  • Kernel trap handler dispatches:
  • Memory manager fault handler (MmAccessFault) called
  • Runs in context of thread that incurred the fault
  • Attempts to resolve the fault or

raises exception

  • Page faults can be caused by variety of conditions
  • Four basic kinds of invalid Page Table Entries (PTEs)
slide-50
SLIDE 50

Operating Systems 27

In-Paging I/O due to Access Faults

  • Accessing a page that is not resident in memory but on disk in page fjle/mapped fjle
  • Allocate memory and read page from disk into working set
  • Occurs when read operation must be issued to a fjle to satisfy page fault
  • Page tables are pageable -> additional page faults possible
  • In-page I/O is synchronous
  • Thread waits until I/O completes
  • Not interruptible by asynchronous procedure calls
  • During in-page I/O: faulting thread does not own critical memory management synchronization
  • bjects

Other threads in process may issue VM functions, but:

  • Another thread could have faulted same page: collided page fault
  • Page could have been deleted (remapped) from virtual address space
  • Protection on page may have changed
  • Fault could have been for prototype PTE and page that maps prototype PTE could have been
  • ut of working set
slide-51
SLIDE 51

Operating Systems 28

Reasons for access faults

  • Accessing page that is on standby or modifjed list
  • Transition the page to process or system working set
  • Accessing page that has no committed storage
  • Access violation
  • Accessing kernel page from user-mode
  • Access violation
  • Writing to a read-only page
  • Access violation
slide-52
SLIDE 52

Operating Systems 29

Reasons for access faults (contd.)

  • Writing to a guard page
  • Guard page violation (if a reference to a user-mode stack,

perform automatic stack expansion)

  • Writing to a copy-on-write page
  • Make process-private copy of page and replace original in process or system

working set

  • Referencing a page in system space that is valid but not in the process page directory

(if paged pool expanded after process directory was created)

  • Copy page directory entry from master system page directory structure and

dismiss exception

  • On a multiprocessor system: writing to valid page that has not yet been written to
  • Set dirty bit in PTE
slide-53
SLIDE 53

Operating Systems 30

Reserving & Committing Memory

  • Optional 2-phase approach to memory allocation:
  • Reserve address space (in multiples of page size)
  • Commit storage in that address space
  • Can be combined in one call (VirtualAlloc, VirtualAllocEx)
  • Reserved memory:
  • Range of virtual addresses reserved for future use (contiguous bufger)
  • Accessing reserved memory results in access violation
  • Fast, inexpensive
  • Committed memory:
  • Has backing store (pagefjle.sys, memory-mapped fjle)
  • Either private or mapped into a view of a section
  • Decommit via VirtualFree, VirtualFreeEx

A thread‘s user-mode stack is constructed using this 2-phase approach: initial reserved size is 1MB,

  • nly 2 pages are committed: stack & guard page
slide-54
SLIDE 54

Operating Systems 31

Soft vs. Hard Page Faults

  • Types of “soft” page faults:
  • Pages can be faulted back into a process from the standby and modifjed page lists
  • A shared page that’s valid for one process can be faulted into other processes
  • Some hard page faults unavoidable
  • Process startup (loading of EXE and DLLs)
  • Normal fjle I/O done via paging

Cached fjles are faulted into system working set

  • To determine paging vs. normal fjle I/Os:
  • Monitor Memory->Page Reads/sec

Not Memory->Page Faults/sec, as that includes soft page faults

  • Subtract System->File Read Operations/sec from Page Reads/sec
  • Or, use Filemon to determine what fjle(s) are having paging I/O (asterisk next to I/O

function)

  • Should not stay high for sustained period
slide-55
SLIDE 55

Operating Systems 32

Working Set

  • Working set: All the physical pages “owned” by a process
  • Essentially, all the pages the process can reference without incurring a page fault
  • Working set limit: The maximum pages the process can own
  • When limit is reached, a page must be released for every page that’s brought in

(“working set replacement”)

  • Default upper limit on size for each process
  • System-wide maximum calculated & stored in MmMaximumWorkingSetSize

approximately RAM minus 512 pages (2 MB on x86) minus min size of system working set (1.5 MB on x86)

Interesting to view (gives you an idea how much memory you’ve “lost” to the OS)

  • True upper limit: 2 GB minus 64 MB for 32-bit Windows
slide-56
SLIDE 56

Operating Systems 33

Working Set List

  • A process always starts with an empty working set
  • It then incurs page faults when referencing a page that isn’t

in its working set

  • Many page faults may be resolved from memory (to be

described later)

PerfMon Process “WorkingSet” newer pages

  • lder pages
slide-57
SLIDE 57

Operating Systems 34

Birth of a Working Set

  • Pages are brought into memory as a result of page faults
  • If the page is not in memory, the appropriate block in the associated fjle is read in
  • Physical page is allocated
  • Block is read into the physical page
  • Page table entry is fjlled in
  • Exception is dismissed
  • Processor re-executes the instruction that caused the page fault

(and this time, it succeeds)

  • The page has now been “faulted into” the process “working set”
slide-58
SLIDE 58

Operating Systems 35

Process Creation

  • New processes need three pages:
  • The page directory

– Points to itself – Map the page table of the hyperspace – Map system paged and nonpaged areas – Map system cache page table pages

  • The page table page for working set
  • The page for the working set list
slide-59
SLIDE 59

Operating Systems 18

Page Replacement: First In First Out (FIFO)

  • Access to Memory Pages, in order:

3, 2, 1, 0, 3, 2, 4, 3, 2, 1, 0, 4

  • If a page needs to be replaced, evict the oldest page

3 2 1 3 2 4 3 2 1 4 3 3 3 4 4 4 4 4 4

  • 2

2 2 3 3 3 3 3 1 1 1

  • 1

1 1 2 2 2 2 2

slide-60
SLIDE 60

Operating Systems 19

  • increasing the number of page frames results in an increase in

the number of page faults for a given memory access pattern

3 frames 4 frames

Bélády's anomaly – FIFO anomaly

3 2 1 3 2 4 3 2 1 4 3 3 3 3 3 3 4 4 4 4

  • 2

2 2 2 2 2 3 3 3 3 4

  • 1

1 1 1 1 1 2 2 2 2

  • 1

1 1 3 2 1 3 2 4 3 2 1 4

  • 3

2 1 3 2 4 3 2 1 4 3 3 3 4 4 4 4 4 4

  • 2

2 2 3 3 3 3 3 1 1 1

  • 1

1 1 2 2 2 2 2

slide-61
SLIDE 61

Operating Systems 20

Page Replacement: Least Recently Used (LRU)

  • If a page needs to be replaced, evict the longest unused page
  • expensive: every read access requires write to timestamp
  • expensive: fjnding page to replace needs lots of comparisons

3 2 1 3 2 4 3 2 1 4 3 3 3 4 4 4 1 1 1

  • 2

2 2 3 3 3 3 3 3

  • 1

1 1 2 2 2 2 2 2 4

slide-62
SLIDE 62

Operating Systems 21

Page Replacement: Second Chance

  • Extension to FIFO, favoring frequently accessed pages
  • set “accessed” bit to 1 on page access (not page load)
  • If a page needs to be replaced, evict the oldest page with unset

access bit, then unset all access bits

3 2 1 3 2 4 3 2 1 4 3 3 3 4 4 4 1 1 1

  • 2

2 2 3 3 3 3 3 3

  • 1

1 1 2 2 2 2 2 2 4

slide-63
SLIDE 63

Operating Systems 6

Prototype Page Table Entries (PTEs)

  • Software structure to manage page table redundancy for potentially shared

pages

  • Array of prototype PTEs is created as part of shared memory object
  • Provide reference count for shared pages
  • Shared page valid:
  • process & prototype PTE point to physical page
  • Page invalidated:
  • process PTE points to prototype PTE
  • Prototype PTE describes 5 states for shared page:
  • Active/valid, Transition, Demand zero, Page fjle, Mapped fjle
  • Layer between page table and page frame number database
slide-64
SLIDE 64

Operating Systems 7

Prototype PTEs – the bigger picture

  • Two virtual pages in a mapped view
  • First page is valid; 2nd page is invalid and in page fjle
  • Prototype PTE contains exact location
  • Process PTE points to prototype PTE

PFN Valid PFN n Invalid - points to prototype PTE Valid PFN n Invalid – in page file Segment structure PFN n PFN n

PTE address

Share count=1 PFN entry Physical memory Prototype page tables Page table Page directory

slide-65
SLIDE 65

Operating Systems 11

Copy-On-Write Pages

  • Pages are originally set up as shared, read-only
  • Access violation on write attempt

– pager makes a copy of the page and allocates it privately

to the process doing the write

  • So, only need unique copies for the pages in the shared

region that are actually written (example of “lazy evaluation”)

  • Original values of data are still shared

– e.g. writeable data initialized with C initializers

  • Data copied to new physical page, but virtual addresses

unaltered

slide-66
SLIDE 66

Operating Systems 12

Copy-On-Write Pages

Physical memory Page 3 Page 1 Process Address Space

  • Orig. Data

Process Address Space

  • Orig. Data

Page 2

slide-67
SLIDE 67

Operating Systems 13

Copy-On-Write Pages

Process Address Space Physical memory Process Address Space

  • Orig. Data

Page 3 Page 1 Page 2 Mod’d. Data Copy of page 2

slide-68
SLIDE 68

Operating Systems 16

Working Set Replacement

  • When working set max reached (or working set trim occurs),

process must give up pages to make room for new pages

  • a single process cannot take over all of physical memory unless
  • ther processes aren’t using it
  • Question: Which pages to remove?

Process “WorkingSet” to standby

  • r modified

page list

slide-69
SLIDE 69

Operating Systems 17

Balance Set Manager

  • Balance set = sum of all working sets
  • Balance Set Manager is a system thread
  • Wakes up every second. If paging activity high or memory needed:

trims working sets of processes

if thread in a long user-mode wait, marks kernel stack pages as pageable

if process has no nonpageable kernel stacks, “outswaps” process

triggers a separate thread to do the “outswap” by gradually reducing target process’s working set limit to zero

  • Evidence: Look for threads in “Transition” state in PerfMon
  • Means that kernel stack has been paged out, and thread is waiting for memory to

be allocated so it can be paged back in

slide-70
SLIDE 70

Operating Systems 22

System Working Set

  • Just as processes have working sets, Windows’ pageable system-space code and data lives in the

“system working set”

  • Made up of 4 components:
  • Paged pool
  • Pageable code and data in the executive
  • Pageable code and data in kernel-mode drivers, Win32K.Sys, graphics drivers, etc.
  • Global fjle system data cache
  • To get physical (resident) size of these with PerfMon, look at:
  • Memory | Pool Paged Resident Bytes
  • Memory | System Code Resident Bytes
  • Memory | System Driver Resident Bytes
  • Memory | System Cache Resident Bytes
  • Memory | Cache bytes counter is total of these four “resident” (physical) counters (not just

the cache; in NT4, same as “File Cache” on Task Manager / Performance tab)

slide-71
SLIDE 71

Operating Systems 24

Working Sets in Memory

  • As processes incur page faults, pages are

removed from the free, modifjed, or standby lists and made part of the process working set

  • A shared page may be resident in several

processes’ working sets at one time (this case not illustrated here)

Process 3 Process 3 Process 2 Process 2 Process 1 Process 1

00000000 7FFFFFFF 80000000

Pages in Physical Memory F F F F M M M M M S S S S F F F F F F F F 3 3 3 1 2 1 2 2 1

slide-72
SLIDE 72

Operating Systems 25

Page Files / Swap Space

  • Page File is a hidden fjle on a Windows Volume (pagefjle.sys)
  • What gets sent to the paging fjle?
  • no code – only modifjed data
  • code can be re-read from image fjle anytime, no benefjt to create a copy

in the page fjle

  • When do pages get paged out?
  • Only when necessary
  • Page fjle space is only reserved at the time pages are written out
  • Once a page is written to the paging fjle, the space is occupied until the

memory is deleted (e.g. at process exit), even if the page is read back from disk

slide-73
SLIDE 73

Operating Systems 27

Sizing the Page File / Swap Partition

  • Given understanding of page fjle usage, how big should the total paging fjle space

be?

  • Size should depend on total private virtual memory used by applications and drivers
  • Therefore, not related to RAM size, except for taking a full memory dump
  • Hibernation on Windows uses a separate fjle to persist memory (hiberfjl.sys)
  • Worst case: system has to page all private data out to make room for code pages
  • To handle, minimum size should be the maximum of VM usage

(“Commit Charge Peak”)

Hard disk space is cheap, so why not double this

  • Page File can grow, Swap partition is statically sized
slide-74
SLIDE 74

Operating Systems 28

Contiguous Page Files

  • Page File / Swap Space will fragment with use
  • Pages not written contiguously
  • Number of fragments will eventually impact paging performance
  • Can defrag with Pagedefrag tool on Windows

(freeware - www.sysinternals.com)

  • Will defrag on reboot
slide-75
SLIDE 75

Operating Systems 29

When Page Files are Full

  • When page fjle space runs low
  • 1. “System running low on virtual memory”

First time: Before pagefjle expansion

Second time: When committed bytes reaching commit limit

  • 2. “System out of virtual memory”

Page fjles are full

  • Look for who is consuming pagefjle space:
  • Process memory leak: Check Task Manager, Processes tab, VM Size column

  • r Perfmon “private bytes”, same counter
  • Paged pool leak: Check paged pool size

Run poolmon to see what object(s) are fjlling pool

Could be a result of processes not closing handles - check process “handle count” in Task Manager