Memory Management in Linux By: Rohan Garg 2002134 Gaurav Gupta - - PowerPoint PPT Presentation

memory management in linux
SMART_READER_LITE
LIVE PREVIEW

Memory Management in Linux By: Rohan Garg 2002134 Gaurav Gupta - - PowerPoint PPT Presentation

Memory Management in Linux By: Rohan Garg 2002134 Gaurav Gupta 2002435 Architecture Independent Memory Model Process virtual address space divided into pages Page size given in PAGE_SIZE macro in asm/page.h (4K for x86 and 8K for


slide-1
SLIDE 1

Memory Management in Linux

By: Rohan Garg 2002134 Gaurav Gupta 2002435

slide-2
SLIDE 2

Architecture Independent Memory Model

Process virtual address space divided into pages Page size given in PAGE_SIZE macro in asm/page.h

(4K for x86 and 8K for Alpha)

The pages are divided between 4 segments User Code, User Data, Kernel Code, Kernel Data In User mode, access only User Code and User Data But in Kernel mode, access also needed for User Data

slide-3
SLIDE 3

Addressing the page table

Segment + Offset = 4 GB Linear address (32 bits) Of this, user space = 3 GB (defined by

TASK_SIZE macro) and kernel space = 1GB

Linear Address converted to physical address using 3

levels

Index into Page Dir. Index into Page Middle Dir. Index into Page Table Page Offset

slide-4
SLIDE 4

Requesting and Releasing Page Frames

alloc_pages(gfp_mask, order) :- used to request 2order

contiguous page frames.

alloc_page(gfp_mask) :- returns the address of the

descriptor of the allocated page frame. For only one page.

__get_free_pages(gfp_mask, order) :- returns the linear

address of the first allocated page.

get_zeroed_page(gfp_mask) :- first invokes alloc_pages

and then fills it with zeros.

__get_dma_pages(gfp_mask, order) :- gets page frame

suitable for DMA.

slide-5
SLIDE 5

GFP mask

The flag specifies how to look for free page frames. E.g. GFP_WAIT :- kernel is allowed to block the

current process waiting for free page frames.

slide-6
SLIDE 6

Freeing page frames

__free_pages(page, order) :- If the count field of the

descriptor is > 0, then decreases it by 1 else frees the 2order contiguous page frames.

free_pages(addr, order) :- Frees the single page at

address = addr.

__free_page(page) :- Releases the page frame

having page descriptor = page.

free_page(addr) :- Releases the page frame having

address = addr.

slide-7
SLIDE 7

Finding a Physical Page

  • unsigned long __get_free_pages(int priority, unsigned long
  • rder, int dma) in

mm/page_alloc.c

  • Priority =
  • GFP_BUFFER (free page returned only if available in

physical memory)

  • GFP_ATOMIC (return page if possible, do not interrupt

current process)

  • GFP_USER (current process can be interrupted)
  • GFP_KERNEL (kernel can be interrupted)
  • GFP_NOBUFFER (do not attempt to reduce buffer cache)
  • rder says give me 2^ ^ order pages (max is 128KB)
  • dma specifies that it is for DMA purposes
slide-8
SLIDE 8

Page descriptor

  • Used to keep track of the current status of each page frame.
  • Some of the key fields of the structure are described below:
  • list:- contains pointers to next and previous items in a doubly

linked list in a page descriptor.

  • count:- usage reference counter for the page. A value greater

than 0 implies more than one processor using the page frame.

  • flags:- describe the status of the page frame.
  • LRU:- contains pointers to the least recently used doubly linked

list of pages.

  • zone:- the zone to which the page frame belongs.
slide-9
SLIDE 9

Buddy System Algorithm

Used for allocating groups of contiguous page frames

and helps in solving the problem of external fragmentation.

All free page frames are grouped into lists of blocks

containing groups of 1, 2, 4, 8,….,512 contiguous page frames.

If 128 contiguous page frames are required, list 128

is consulted. If not found, list 256 is consulted. If a block is found then the remaining 128 is added to the list 128. If not found list 512 is consulted and so on.

slide-10
SLIDE 10

Slab Allocator

Runs over the basic “buddy heap algorithm”. It does not discard the ones allocated objects and

saves them in memory, thus avoiding reinitialization.

Created pools of memory areas of same type called

caches.

Caches are divided into slabs, each slab consisting of

  • ne or more contiguous page frames.

Slab allocator never releases page frames of an

empty slab unless kernel is looking for additional free page frames.

slide-11
SLIDE 11

Interface between slab allocator and buddy system.

  • void * kmem_getpages(kmem_cache_t * cachep, unsinged long flags)

{ void * addr; flags |= cachep-> gfpflags; addr = (void* ) __get_free_pages(flags, cachep-> gfporder); return addr; }

  • Slab allocator invokes this function to call buddy system algorithm to
  • btain a group of free contiguous page frames.
  • Similarly kmem_freepages() is used by the slab allocator to release a

group of page frames.

slide-12
SLIDE 12

Process Address Space

Kernel 0xC0000000 File name, Environment Arguments Stack bss _end _bss_start Data _edata _etext Code Header 0x84000000

Shared Libs

slide-13
SLIDE 13

Address Space Descriptor

  • mm_struct defined in the process descriptor. (in linux/sched.h)
  • This is duplicated if CLONE_VM is specified on forking.
  • struct mm_struct {

int count; // no. of processes sharing this descriptor pgd_t *pgd; //page directory ptr unsigned long start_code, end_code; unsigned long start_data, end_data; unsigned long start_brk, brk; unsigned long start_stack; unsigned long arg_start, arg_end, env_start, env_end; unsigned long rss; // no. of pages resident in memory unsigned long total_vm; // total # of bytes in this address space unsigned long locked_vm; // # of bytes locked in memory unsigned long def_flags; // status to use when mem regions are created struct vm_area_struct *mmap; // ptr to first region desc. struct vm_area_struct *mmap_avl; // faster search of region desc.

}

slide-14
SLIDE 14

Memory Allocation for Kernel Segment

  • Static

Memory_start = console_init(memory_start, memory_end);

Typically done for drivers to reserve areas, and for some other kernel components.

  • Dynamic

Void *kmalloc(size, priority), Void kfree (void *) Void *vmalloc(size), void *vmfree(void *)

Kmalloc is used for physically contiguous pages while vmalloc does not necessarily allocate physically contiguous pages Memory allocated is not initialized (and is not paged out).

slide-15
SLIDE 15

kmalloc() data structures

32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 sizes[]

bh bh bh bh bh bh

Null Null

page_descriptor size_descriptor

slide-16
SLIDE 16

vmalloc()

  • Allocated virtually contiguous pages, but they do not need to be

physically contiguous.

  • Uses __get_free_page() to allocate physical frames.
  • Once all the required physical frames are found, the virtual

addresses are created (and mappings set) at an unused part.

  • The virtual address search (for unused parts) on x86 begins at

the next address after physical memory on an 8 MB boundary.

  • One (virtual) page is left free after each allocation for

cushioning.

slide-17
SLIDE 17

vmalloc vs kmalloc

Contiguous vs non-contiguous physical memory kmalloc is faster but less flexible vmalloc involves __get_free_page() and may need to

block to find a free physical page

DMA requires contiguous physical memory

slide-18
SLIDE 18

Paging

  • All kernel segment pages are locked in memory (no swapping)
  • User pages can be paged out:

Complete block device Fixed length files in a file system

  • First 4096 bytes are a bitmap indicating that space for that page

is available for paging.

  • At byte 4086, string “SWAP_SPACE” is stored.
  • Hence, max swap of 4086* 8-1 = 32687 pages = 130784KB per

device or file

  • MAX_SWAPFILES specifies number of swap files or devices
  • Swap device is more efficient than swap file.
slide-19
SLIDE 19

Page Fault

  • Error code written onto stack, and the VA is stored in register

CR2

  • do_page_fault(struct pt_regs *regs, unsigned long

error_code) is now called.

  • If faulting address is in kernel segment, alarm messages are

printed out and the process is terminated.

  • If faulting address is not in a virtual memory area, check if

VM_GROWSDOWN for the nexy virtual memory area is set (I.e. Stack). If so, expand VM. If error in expanding send SIGSEGV.

  • If faulting address is in a virtual memory area, check if

protection bits are OK. If not legal, send SIGSEGV. Else, call

do_no_page() or do_wp_page().

slide-20
SLIDE 20

Page Replacement Algorithm

  • LRU – Least Recently used replacement

NFU – Not Frequently Used replacement Page Ageing based replacement

  • Working Set algorithm based on locality of references per

process

  • Working Set based clock algorithms
  • LRU with Ageing and Working Set algorithms are efficient to use

and are commonly used

slide-21
SLIDE 21

Page Replacement handling in Linux kernel

  • Page Cache
  • Pages are added to the Page cache for fast lookup.
  • Page cache pages are hashed based on their address space and

page index.

  • Inode or disk block pages, shared pages and anonymous pages

form the page cache.

  • Swap cached pages also part of the page cache represent the

swapped pages.

  • Anonymous pages enter the swap cache at swap-out time and

shared pages enter when they become dirty.

slide-22
SLIDE 22

LRU Cache

  • LRU cache is made up of active lists and inactive lists.
  • These lists are populated during page faults and when page cached

pages are accessed or referenced.

  • kswapd is the page out kernel thread that balances the LRU cache and

trickles out pages based on an approximation to LRU algorithm.

  • Active lists contains referenced pages. This list is monitored for Page

references through refill_inactive

  • Referenced pages are given a chance to age through Move To Front

and unreferenced pages are moved to the inactive list

  • The inactive lists contains the set of Inactive clean and inactive dirty

pages.

  • This set is monitored on a timely basis when pages_high threshold is

reached for free pages on a per zone basis is crossed.

slide-23
SLIDE 23

Thank you