Memory Management in Linux By: Rohan Garg 2002134 Gaurav Gupta - - PowerPoint PPT Presentation
Memory Management in Linux By: Rohan Garg 2002134 Gaurav Gupta - - PowerPoint PPT Presentation
Memory Management in Linux By: Rohan Garg 2002134 Gaurav Gupta 2002435 Architecture Independent Memory Model Process virtual address space divided into pages Page size given in PAGE_SIZE macro in asm/page.h (4K for x86 and 8K for
Architecture Independent Memory Model
Process virtual address space divided into pages Page size given in PAGE_SIZE macro in asm/page.h
(4K for x86 and 8K for Alpha)
The pages are divided between 4 segments User Code, User Data, Kernel Code, Kernel Data In User mode, access only User Code and User Data But in Kernel mode, access also needed for User Data
Addressing the page table
Segment + Offset = 4 GB Linear address (32 bits) Of this, user space = 3 GB (defined by
TASK_SIZE macro) and kernel space = 1GB
Linear Address converted to physical address using 3
levels
Index into Page Dir. Index into Page Middle Dir. Index into Page Table Page Offset
Requesting and Releasing Page Frames
alloc_pages(gfp_mask, order) :- used to request 2order
contiguous page frames.
alloc_page(gfp_mask) :- returns the address of the
descriptor of the allocated page frame. For only one page.
__get_free_pages(gfp_mask, order) :- returns the linear
address of the first allocated page.
get_zeroed_page(gfp_mask) :- first invokes alloc_pages
and then fills it with zeros.
__get_dma_pages(gfp_mask, order) :- gets page frame
suitable for DMA.
GFP mask
The flag specifies how to look for free page frames. E.g. GFP_WAIT :- kernel is allowed to block the
current process waiting for free page frames.
Freeing page frames
__free_pages(page, order) :- If the count field of the
descriptor is > 0, then decreases it by 1 else frees the 2order contiguous page frames.
free_pages(addr, order) :- Frees the single page at
address = addr.
__free_page(page) :- Releases the page frame
having page descriptor = page.
free_page(addr) :- Releases the page frame having
address = addr.
Finding a Physical Page
- unsigned long __get_free_pages(int priority, unsigned long
- rder, int dma) in
mm/page_alloc.c
- Priority =
- GFP_BUFFER (free page returned only if available in
physical memory)
- GFP_ATOMIC (return page if possible, do not interrupt
current process)
- GFP_USER (current process can be interrupted)
- GFP_KERNEL (kernel can be interrupted)
- GFP_NOBUFFER (do not attempt to reduce buffer cache)
- rder says give me 2^ ^ order pages (max is 128KB)
- dma specifies that it is for DMA purposes
Page descriptor
- Used to keep track of the current status of each page frame.
- Some of the key fields of the structure are described below:
- list:- contains pointers to next and previous items in a doubly
linked list in a page descriptor.
- count:- usage reference counter for the page. A value greater
than 0 implies more than one processor using the page frame.
- flags:- describe the status of the page frame.
- LRU:- contains pointers to the least recently used doubly linked
list of pages.
- zone:- the zone to which the page frame belongs.
Buddy System Algorithm
Used for allocating groups of contiguous page frames
and helps in solving the problem of external fragmentation.
All free page frames are grouped into lists of blocks
containing groups of 1, 2, 4, 8,….,512 contiguous page frames.
If 128 contiguous page frames are required, list 128
is consulted. If not found, list 256 is consulted. If a block is found then the remaining 128 is added to the list 128. If not found list 512 is consulted and so on.
Slab Allocator
Runs over the basic “buddy heap algorithm”. It does not discard the ones allocated objects and
saves them in memory, thus avoiding reinitialization.
Created pools of memory areas of same type called
caches.
Caches are divided into slabs, each slab consisting of
- ne or more contiguous page frames.
Slab allocator never releases page frames of an
empty slab unless kernel is looking for additional free page frames.
Interface between slab allocator and buddy system.
- void * kmem_getpages(kmem_cache_t * cachep, unsinged long flags)
{ void * addr; flags |= cachep-> gfpflags; addr = (void* ) __get_free_pages(flags, cachep-> gfporder); return addr; }
- Slab allocator invokes this function to call buddy system algorithm to
- btain a group of free contiguous page frames.
- Similarly kmem_freepages() is used by the slab allocator to release a
group of page frames.
Process Address Space
Kernel 0xC0000000 File name, Environment Arguments Stack bss _end _bss_start Data _edata _etext Code Header 0x84000000
Shared Libs
Address Space Descriptor
- mm_struct defined in the process descriptor. (in linux/sched.h)
- This is duplicated if CLONE_VM is specified on forking.
- struct mm_struct {
int count; // no. of processes sharing this descriptor pgd_t *pgd; //page directory ptr unsigned long start_code, end_code; unsigned long start_data, end_data; unsigned long start_brk, brk; unsigned long start_stack; unsigned long arg_start, arg_end, env_start, env_end; unsigned long rss; // no. of pages resident in memory unsigned long total_vm; // total # of bytes in this address space unsigned long locked_vm; // # of bytes locked in memory unsigned long def_flags; // status to use when mem regions are created struct vm_area_struct *mmap; // ptr to first region desc. struct vm_area_struct *mmap_avl; // faster search of region desc.
}
Memory Allocation for Kernel Segment
- Static
Memory_start = console_init(memory_start, memory_end);
Typically done for drivers to reserve areas, and for some other kernel components.
- Dynamic
Void *kmalloc(size, priority), Void kfree (void *) Void *vmalloc(size), void *vmfree(void *)
Kmalloc is used for physically contiguous pages while vmalloc does not necessarily allocate physically contiguous pages Memory allocated is not initialized (and is not paged out).
kmalloc() data structures
32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 sizes[]
bh bh bh bh bh bh
Null Null
page_descriptor size_descriptor
vmalloc()
- Allocated virtually contiguous pages, but they do not need to be
physically contiguous.
- Uses __get_free_page() to allocate physical frames.
- Once all the required physical frames are found, the virtual
addresses are created (and mappings set) at an unused part.
- The virtual address search (for unused parts) on x86 begins at
the next address after physical memory on an 8 MB boundary.
- One (virtual) page is left free after each allocation for
cushioning.
vmalloc vs kmalloc
Contiguous vs non-contiguous physical memory kmalloc is faster but less flexible vmalloc involves __get_free_page() and may need to
block to find a free physical page
DMA requires contiguous physical memory
Paging
- All kernel segment pages are locked in memory (no swapping)
- User pages can be paged out:
Complete block device Fixed length files in a file system
- First 4096 bytes are a bitmap indicating that space for that page
is available for paging.
- At byte 4086, string “SWAP_SPACE” is stored.
- Hence, max swap of 4086* 8-1 = 32687 pages = 130784KB per
device or file
- MAX_SWAPFILES specifies number of swap files or devices
- Swap device is more efficient than swap file.
Page Fault
- Error code written onto stack, and the VA is stored in register
CR2
- do_page_fault(struct pt_regs *regs, unsigned long
error_code) is now called.
- If faulting address is in kernel segment, alarm messages are
printed out and the process is terminated.
- If faulting address is not in a virtual memory area, check if
VM_GROWSDOWN for the nexy virtual memory area is set (I.e. Stack). If so, expand VM. If error in expanding send SIGSEGV.
- If faulting address is in a virtual memory area, check if
protection bits are OK. If not legal, send SIGSEGV. Else, call
do_no_page() or do_wp_page().
Page Replacement Algorithm
- LRU – Least Recently used replacement
NFU – Not Frequently Used replacement Page Ageing based replacement
- Working Set algorithm based on locality of references per
process
- Working Set based clock algorithms
- LRU with Ageing and Working Set algorithms are efficient to use
and are commonly used
Page Replacement handling in Linux kernel
- Page Cache
- Pages are added to the Page cache for fast lookup.
- Page cache pages are hashed based on their address space and
page index.
- Inode or disk block pages, shared pages and anonymous pages
form the page cache.
- Swap cached pages also part of the page cache represent the
swapped pages.
- Anonymous pages enter the swap cache at swap-out time and
shared pages enter when they become dirty.
LRU Cache
- LRU cache is made up of active lists and inactive lists.
- These lists are populated during page faults and when page cached
pages are accessed or referenced.
- kswapd is the page out kernel thread that balances the LRU cache and
trickles out pages based on an approximation to LRU algorithm.
- Active lists contains referenced pages. This list is monitored for Page
references through refill_inactive
- Referenced pages are given a chance to age through Move To Front
and unreferenced pages are moved to the inactive list
- The inactive lists contains the set of Inactive clean and inactive dirty
pages.
- This set is monitored on a timely basis when pages_high threshold is
reached for free pages on a per zone basis is crossed.