Processes and the Kernel Jeff Chase Duke University - - PowerPoint PPT Presentation

processes and the kernel
SMART_READER_LITE
LIVE PREVIEW

Processes and the Kernel Jeff Chase Duke University - - PowerPoint PPT Presentation

Processes and the Kernel Jeff Chase Duke University OS Platform: A Model Applications /services. May interact and serve one another. API Libraries/frameworks : packaged code used by multiple


slide-1
SLIDE 1

Processes ¡and ¡the ¡Kernel ¡

¡ Jeff ¡Chase ¡ Duke ¡University ¡

slide-2
SLIDE 2

OS Platform: A Model

OS platform: same for all applications on a system E,g,, classical OS kernel Libraries/frameworks: packaged code used by multiple applications Applications/services. May interact and serve

  • ne another.

OS mediates access to shared resources. That requires protection and isolation.

[RAD Lab] Protection boundary API API

slide-3
SLIDE 3

Operating Systems: The Classical View

data data

Programs run as independent processes. Protected system calls ...and upcalls (e.g., signals) Protected OS kernel mediates access to shared resources. Threads enter the kernel for OS services. Each process has a private virtual address space and one

  • r more

threads. The kernel code and data are protected from untrusted processes.

slide-4
SLIDE 4

Android Security Architecture

[http://developer.android.com/guide/topics/security/permissions.html]

“A central design point of the Android security architecture is that no application, by default, has permission to perform any operations that would adversely impact other applications, the operating system, or the

  • user. This includes reading or writing the user's private data (such as

contacts or emails), reading or writing another application's files, performing network access, keeping the device awake, and so on. Because each Android application operates in a process sandbox, applications must explicitly share resources and data. They do this by declaring the permissions they need for additional capabilities not provided by the basic sandbox. Applications statically declare the permissions they require, and the Android system prompts the user for consent at the time the application is installed. Android has no mechanism for granting permissions dynamically (at run-time) because it complicates the user experience to the detriment of security.”

Isolation and Sharing

slide-5
SLIDE 5

Program

Running a program

When a program launches, the OS initializes a process with a virtual memory to store the running program’s code and data. Typically it sets up the segments by memory- mapping sections of the executable file.

data

code (“text”) constants initialized data sections segments Process with virtual memory mapped file regions

slide-6
SLIDE 6

What’s in an Object File or Executable?

int j = 327; char* s = “hello\n”; char sbuf[512]; int p() { int k = 0; j = write(1, s, 6); return(j); }

text

data

idata wdata

header symbol table

relocation records

program instructions p immutable data (constants) “hello\n” writable global/static data j, s j, s ,p,sbuf

Header “magic number” indicates type of file/image. Section table an array

  • f (offset, len, startVA)

sections

Used by linker; may be removed after final link step and strip. Also includes info for debugger.

slide-7
SLIDE 7

Building and running a program

chase:lab1> make gcc -I. -Wall -lm -DNDEBUG -c dmm.c … gcc -I. -Wall -lm -DNDEBUG -o test_basic test_basic.c dmm.o gcc -I. -Wall -lm -DNDEBUG -o test_coalesce test_coalesce.c dmm.o gcc -I. -Wall -lm -DNDEBUG -o test_stress1 test_stress1.c dmm.o gcc -I. -Wall -lm -DNDEBUG -o test_stress2 test_stress2.c dmm.o chase:lab1> chase:lab1> ./test_basic calling malloc(10) call to dmalloc() failed chase:lab1>

slide-8
SLIDE 8

The Birth of a Program (C/Ux)

int j; char* s = “hello\n”; int p() { j = write(1, s, 6); return(j); }

myprogram.c

compiler

…..

p: store this store that push jsr _write ret etc.

myprogram.s

assembler

data

myprogram.o

linker

  • bject

file

data

program

(executable file) myprogram

data data data libraries and other

  • bject

files or archives

header files

slide-9
SLIDE 9

Memory segments: a view from C

  • Globals:

– Fixed-size segment – Writable by user program – May have initial values

  • Text (instructions)

– Fixed-size segment – Executable – Not writable

  • Heap and stack

– Variable-size segments – Writable – Zero-filled on demand

globals

registers

RCX PC/RIP x SP/RBP y

heap stack segments text CPU core

slide-10
SLIDE 10

Linux x86-64 VAS layout

Program idata heap stack text

0x400000 0x600000

data

0x601000 r-x r-- rw- 0x1299000 0x7fff1373b000

lib lib

r-x r-x 0x2ba976c30000 64K 0x7fff1375c000

libc.so shared library

rw- [anon] rw- [anon]

N

high addresses Example: the details aren’t important.

slide-11
SLIDE 11

Today

  • The operating system kernel!

– What is it? – Where is it? – How do we get there? – How is it protected? – How does it control resources? – How does it control access to data? – How does it keep control? kernel

kernel space user space User processes/ segments

slide-12
SLIDE 12

Precap: the kernel

  • Today, all “real” operating systems have protected kernels.
  • The kernel resides in a well-known file: the machine automatically

loads it into memory and starts it (boot) on power-on/reset.

  • The kernel is (mostly) a library of service procedures shared by all

user programs, but the kernel is protected:

  • User code cannot access internal kernel data structures directly.
  • User code can invoke the kernel only at well-defined entry points

(system calls).

  • Kernel code is “just like” user code, but the kernel is privileged:
  • The kernel has direct access to all machine functions, and defines

the handler entry points for CPU events: trap, fault, interrupt.

  • Once booted, the kernel acts as one big event handler.
slide-13
SLIDE 13

The kernel

  • The kernel is just a program: a collection of

modules and their state.

  • E.g., it may be written in C and compiled/

linked a little differently.

– E.g., linked with –static option: no dynamic libs

  • At runtime, kernel code and data reside in a

protected range of virtual addresses.

– The range (kernel space) is “part of” every VAS. – VPN->PFN translations for kernel space are global.

  • (Details vary by machine and OS configuration)

– Access to kernel space is denied for user programs. – Portions of kernel space may be non-pageable and/

  • r direct-mapped to machine memory.

kernel code kernel data kernel space user space VAS 0x0 high

slide-14
SLIDE 14

Example: Windows/IA32

kernel space high-order bit set in virtual address User spaces

  • ne per VAS
  • ccupies

“low half” of VAS (2GB) kernel space two highest bits are set (0xc00..) Alternative configuration allows user spaces larger than 2GB

slide-15
SLIDE 15

Windows IA-32 (Kernel space)

The point is: There are lots of different regions within kernel space to meet internal OS needs.

  • page tables for various VAS
  • page table for kernel space itself
  • file block cache
  • internal data structures

The details aren’t important.

slide-16
SLIDE 16

registers CPU core

R0 Rn PC x mode

The current mode of a CPU core is represented by a field in a protected register. We consider only two possible values: user mode or kernel mode (also called protected mode or supervisor mode). If the core is in protected mode then it can:

  • access kernel space
  • access certain control registers
  • execute certain special instructions

CPU mode: user and kernel

U/K

If software attempts to do any of these things when the core is in user mode, then the core raises a CPU exception (a fault).

slide-17
SLIDE 17

x86 control registers

See [en.wikipedia.org/wiki/Control_register] The details aren’t important.

slide-18
SLIDE 18

Entering the kernel

  • Suppose a CPU core is running user code in

user mode:

– The user program controls the core. – The core goes where the program code takes it… – …as determined by its register state (context) and the values encountered in memory.

  • How does the OS get control back? How

does the core switch to kernel mode?

– CPU exception: trap, fault, interrupt

  • On an exception, the CPU transitions to kernel

mode and resets the PC and SP registers.

– Set the PC to execute a pre-designated handler routine for that exception type. – Set the SP to a pre-designated kernel stack. kernel code kernel data kernel space user space Safe control transfer

slide-19
SLIDE 19

synchronous caused by an instruction asynchronous caused by some other event intentional

happens every time

unintentional

contributing factors

trap: system call

  • pen, close, read,

write, fork, exec, exit, wait, kill, etc.

fault

invalid or protected address or opcode, page fault, overflow, etc.

interrupt

caused by an external event: I/O op completed, clock tick, power fail, etc. “software interrupt” software requests an interrupt to be delivered at a later time

Exceptions and interrupts (“trap, fault, interrupt”)

Usage note: some sources say that exceptions only occur as a result of executing an instruction, and so interrupts are not exceptions. But they are all examples of exceptional changes in control flow due to an event.

slide-20
SLIDE 20

Entry to the kernel

syscall trap/return fault/return interrupt/return

The handler accesses the core register context to read the details of the exception (trap, fault, or interrupt). It may call other kernel routines. Every entry to the kernel is the result of a trap, fault, or interrupt. The core switches to kernel mode and transfers control to a handler routine.

OS kernel code and data for system calls (files, process fork/ exit/wait, pipes, binder IPC, low-level thread support, etc.) and virtual memory management (page faults, etc.) I/O completions timer ticks

slide-21
SLIDE 21

Syscalls/traps

  • Programs in C, C++, etc. invoke system calls by linking to

a standard library (libc) written in assembly.

– The library defines a stub or wrapper routine for each syscall. – Stub executes a special trap instruction (e.g., chmk or callsys or syscall/sysenter instruction) to change mode to kernel. – Syscall arguments/results are passed in registers (or user stack). – OS+machine defines Application Binary Interface (ABI).

read() in Unix libc.a Alpha library (executes in user mode): #define SYSCALL_READ 27 # op ID for a read system call move arg0…argn, a0…an # syscall args in registers A0..AN move SYSCALL_READ, v0 # syscall dispatch index in V0 callsys # kernel trap move r1, _errno # errno = return status return Example read syscall stub for Alpha CPU ISA (defunct)

slide-22
SLIDE 22

MacOS x86-64 syscall example

section .data hello_world db "Hello World!", 0x0a section .text global start start: mov rax, 0x2000004 ; System call write = 4 mov rdi, 1 ; Write to standard out = 1 mov rsi, hello_world ; The address of hello_world string mov rdx, 14 ; The size to write syscall ; Invoke the kernel mov rax, 0x2000001 ; System call number for exit = 1 mov rdi, 0 ; Exit success = 0 syscall ; Invoke the kernel

http://thexploit.com/secdev/mac-os-x-64-bit-assembly-system-calls/ Illustration only: this program writes “Hello World!” to standard output (fd == 1), ignores the syscall error return, and exits. Illustration only: the details aren’t important.

slide-23
SLIDE 23

Linux x64 syscall conventions (ABI)

Illustration only: the details aren’t important. (user buffer addresses)

slide-24
SLIDE 24

Virtual resource sharing

time à à space

Understand that the OS kernel implements resource allocation (memory, CPU,…) by manipulating name spaces and contexts visible to user code. The kernel retains control of user contexts and address spaces via the machine’s limited direct execution model, based on protected mode and exceptions.

slide-25
SLIDE 25

Hear the fans blow

int main() { while(1); }

How does the OS regain control of the core from this program? No system calls! No faults! How to give someone else a chance to run? How to “make” processes share machine resources fairly?

slide-26
SLIDE 26

Timer interrupts

user mode kernel mode kernel “top half”

kernel “bottom half” (interrupt handlers)

u-start clock interrupt interrupt return The system clock (timer) interrupts periodically, giving control back to the kernel. The kernel can do whatever it wants, e.g., switch threads. boot time resume while(1); …

time à à

Enables timeslicing

slide-27
SLIDE 27

Memory Allocation

How should an OS allocate its memory resources among contending demands?

– Virtual address spaces: fork, exec, sbrk, page fault. – The kernel controls how many machine memory frames back the pages of each virtual address space. – The kernel can take memory away from a VAS at any time. – The kernel always gets control if a VAS (or rather a thread running within a VAS) asks for more. – The kernel controls how much machine memory to use as a cache for data blocks whose home is on slow storage. – Policy choices: which pages or blocks to keep in memory? And which ones to evict from memory to make room for others?

slide-28
SLIDE 28

What is a Virtual Address Space?

  • Protection domain

– A “sandbox” for threads that limits what memory they can access for read/write/execute. – A “lockbox” that limits which threads can access any given segment of virtual memory.

  • Uniform name space

– Threads access their code and data items without caring where they are in machine memory, or even if they are resident in memory at all.

  • A set of VàP translations

– A level of indirection mapping virtual pages to page frames. – The OS kernel controls the translations in effect at any time.

slide-29
SLIDE 29

Virtual Address Translation

VPN

  • ffset

12

Example only: a typical 32-bit architecture with 4KB pages.

address translation

Virtual address translation maps a virtual page number (VPN) to a page frame number (PFN) in machine memory: the rest is easy.

PFN

  • ffset

+ machine address { Deliver fault to OS if translation is not valid and accessible in requested mode. virtual address {

slide-30
SLIDE 30

Virtual memory faults

  • Machine memory is “just a cache” over files and

segments: a page fault is “just a cache miss”.

– Machine passes faulting address to kernel (e.g., x86 control register CR2) with fault type and faulting PC. – Kernel knows which virtual space is active on the core (e.g., x86 control register CR3). – Kernel consults other data structures related to virtual memory to figure out how to resolve the fault. – If the fault indicates an error, then signal/kill the process. – Else construct (or obtain) a frame containing the missing page, install the missing translation in the page table, and resume the user code, restarting the faulting instruction.

The x86 details are examples: not important.

slide-31
SLIDE 31

Virtual Addressing: Under the Hood

raise exception probe page table load TLB probe TLB access physical memory access valid? page fault?

kill

(lookup and/or) allocate frame page on disk? fetch from disk zero-fill load TLB

start here MMU OS

illegal reference legal reference

yes no (first reference) yes no miss hit How to monitor page reference events/frequency along the fast path?

slide-32
SLIDE 32

“Limited direct execution”

user mode kernel mode kernel “top half”

kernel “bottom half” (interrupt handlers)

syscall trap u-start u-return u-start fault u-return fault interrupt interrupt return The kernel executes a special instruction to transition to user mode (labeled as “u-return”), with selected values in CPU registers. User code runs on a CPU core in user mode in a user space. If it tries to do anything weird, the core transitions to the kernel, which takes over. boot time

slide-33
SLIDE 33

An analogy

  • Each thread/context transfers

control from user process/mode to kernel and back again.

  • User can juggle ball (execute)

before choosing to hit it back.

  • But kernel can force user to

return the ball at any time.

  • Kernel can juggle or hide the ball

(switch thread out) before hitting it back to user.

  • Kernel can drop ball at any time.
  • Kernel is a multi-armed robot

who plays many users at once.

  • At most one ball in play for each

core/slot at any given time.

slide-34
SLIDE 34

The kernel must be bulletproof

trap Syscalls indirect through syscall dispatch table by syscall number. No direct calls to kernel routines from user space! read() {…} write() {…} copyout copyin What about references to kernel data objects passed as syscall arguments (e.g., file to read or write)? Use an integer index into a kernel table that points at the data object. The value is called a handle or descriptor. No direct pointers to kernel data from user space! Kernel interprets pointer arguments in context of the user VAS, and copies the data in/out of kernel space (e.g., for read and write syscalls). Kernel copies all arguments into kernel space and validates them.

Secure kernels handle system calls verrry carefully.

user buffers User program / user space kernel

slide-35
SLIDE 35

VM and files: the story so far

Files on “disk”

Program

Process

(running program) File system calls (e.g., open/read/write) register context globals heap stack text

Segments (regions) in Virtual Address Space

Thread Memory-mapped sections of program file Anonymous Segments (zero-fill) Per-file inodes indexed with logical blockID #. Read disk block address from map entry.

slide-36
SLIDE 36

Recap: timers, interrupts, faults, etc.

  • When processor core is running a user program, the

user program/thread controls (“drives”) the core.

  • The hardware has a timer device that interrupts the

core after a given interval of time.

  • Interrupt transfers control back to the OS kernel, which

may switch the core to another thread, or resume.

  • Other events also return control to the kernel.

– Wild pointers – Divide by zero – Other program actions – Page faults

slide-37
SLIDE 37

Recap: OS protection

Know how a classical OS uses the hardware to protect itself and implement a limited direct execution model for untrusted user code.

  • Virtual addressing. Applications run in sandboxes that prevent

them from calling procedures in the kernel or accessing kernel data directly (unless the kernel chooses to allow it).

  • Events. The OS kernel installs handlers for various machine events

when it boots (starts up). These events include machine exceptions (faults), which may be caused by errant code, interrupts from the clock or external devices (e.g., network packet arrives), and deliberate kernel calls (traps) caused by programs requesting service from the kernel through its API.

  • Designated handlers. All of these machine events make safe

control transfers into the kernel handler for the named event. In fact,

  • nce the system is booted, these events are the only ways to ever

enter the kernel, i.e., to run code in the kernel.

slide-38
SLIDE 38

EXTRA SLIDES

I hope we get to here

slide-39
SLIDE 39

Concept: isolation

Butler Lampson’s definition: “I am isolated if anything that goes wrong is my fault (or my program’s fault).” Three dimensions of isolation for protected contexts (e.g., processes):

  • Fault isolation. One app or app instance (process) can fail independently
  • f others. If it runs amok, the OS can kill it and reclaim its memory, etc.
  • Performance isolation. The OS manages resources (“metal and glass”:

computing power, memory, disk space, I/O bandwidth, etc.). Each instance needs the “right amount” of resources to run properly. The OS prevents apps from impacting the performance of other apps.

  • Security. An app may contain malware that tries to corrupt the system,

steal data, or otherwise compromise the integrity of the system. The OS uses protected contexts and a reference monitor to check and authorize all accesses to data or objects.

slide-40
SLIDE 40

Architectural foundations

  • A CPU event (an interrupt or exception, i.e., a trap or fault) is an

“unnatural” change in control flow.

  • Like a procedure call, an event changes the PC register.
  • Also changes mode or context (current stack), or both.
  • Events do not change the current space!
  • On boot, the kernel defines a handler routine for each event type.
  • The machine defines the event types.
  • Event handlers execute in kernel mode.
  • Every kernel entry results from an event.
  • Enter at the handler for the event.

control flow event handler (e.g., ISR: Interrupt Service Routine) exception.cc

In some sense, the whole kernel is a “big event handler.”

slide-41
SLIDE 41

Protecting Entry to the Kernel

Protected events and kernel mode are the architectural foundations of kernel-based OS (Unix, Windows, etc).

– The machine defines a small set of exceptional event types. – The machine defines what conditions raise each event. – The kernel installs handlers for each event at boot time. e.g., a table in kernel memory read by the machine

The machine transitions to kernel mode only on an exceptional event. The kernel defines the event handlers. Therefore the kernel chooses what code will execute in kernel mode, and when.

user kernel

interrupt or fault trap/return interrupt or fault

slide-42
SLIDE 42

Example handlers

  • Illegal operation

– Reserved opcode, divide-by-zero, illegal access – That’s a fault! Kernel generates a signal to user program, e.g., to kill it or invoke an application’s exception handler.

  • Page fault

– Case 1: Fetch page (or zero it), map it in PTE, restart instruction – Case 2: Signal error (e.g., “segmentation fault”)

  • Interrupts

– I/O completion, e.g., disk read complete: resume a program – Arriving network packet, etc.: kick the network stack – Clock ticks (timer interrupt): maybe do a context switch – Power fail etc.: save state

slide-43
SLIDE 43

Example: Unix file I/O

char buf[BUFSIZE]; int fd; if ((fd = open(“../zot”, O_TRUNC | O_RDWR) == -1) { perror(“open failed”); exit(1); } while(read(0, buf, BUFSIZE)) { if (write(fd, buf, BUFSIZE) != BUFSIZE) { perror(“write failed”); exit(1); } } An open file is represented by an integer file descriptor value returned by the kernel. Pass the file descriptor value back to kernel to reference the open file

  • n subsequent syscalls.

Read/write syscalls pass virtual address of a user-space buffer. For a write, the kernel retrieves data from the buffer and copies it in to kernel space. For a read, the kernel copies the data out of kernel space and places it into the buffer.

slide-44
SLIDE 44

Unix “file descriptors” illustrated

user space pipe file socket per-process descriptor table kernel space system-wide

  • pen file table

tty

Disclaimer: this drawing is

  • versimplified

(and we will talk about pipes, sockets, and tty later)

pointer Processes often reference OS kernel objects with integers that index into a table

  • f pointers in the kernel. Windows calls them handles.

Example: a Unix file descriptor is a value stored in an ordinary integer variable in a user program. The kernel chooses the value for the descriptor: when the program opens a file, the kernel selects a free entry in the descriptor table and returns its index as the value. The program remembers the number and uses it to name the open file for subsequent system calls. int fd

slide-45
SLIDE 45

Anatomy of a read syscall

  • 1. Compute

(user mode)

  • 2. Enter kernel

for read syscall.

  • 3. Figure out what disk blocks

to fetch, and fetch them into kernel buffers. seek transfer (DMA)

  • 4. sleep for I/O (stall)
  • 5. Copy data from

kernel buffer to user buffer in read. (kernel mode)

CPU Disk

  • 6. Return to

user mode.

Time

slide-46
SLIDE 46

Safe copy primitives

copyin()

Copies len bytes of data from the user-space address uaddr to the kernel-space address kaddr. copyout() <copyout copies out of the kernel and in to the user-space buffer. > copyinstr() Copies a NUL-terminated string, at most len bytes long, from user-space address uaddr to kernel-space address kaddr. The number of bytes actually copied, including the terminating NUL, is returned in *done… RETURN VALUES The copy functions return 0 on success or EFAULT if a bad address is

  • encountered. In addition, the copystr(), and copyinstr() functions

return ENAMETOOLONG if the string is longer than len bytes. This slide clarifies the safe copy primitives used by kernel syscall handlers. The names may be confusing to some of us (;-). Copyin to the kernel, copyout from the kernel. This is an example from BSD Unix systems: the details aren’t important, but note the safety

  • features. [From http://www.unix.com/man-page/FreeBSD/9/copyout/]
slide-47
SLIDE 47

Inside the VAS

[http://manrix.sourceforge.net/microkernelservice.htm]

Each map entry points to a descriptor for the segment (a vm_object). (heap) The triangles represent VM objects (segments). The dots represent pages within

  • segments. A segment may have any number
  • f pages resident.

The vm_map is a linked list of map entries, one for each segment, sorted by starting virtual address. “Vnode” refers to the inode for the underlying (backing) file. The kernel keeps a vm_map for each VAS. This data structure is used in Mach-derived kernels, including BSD Unix and Mac OSX.

slide-48
SLIDE 48

Inside the VAS

[http://manrix.sourceforge.net/microkernelservice.htm]

Text and initialized static data are mapped from the executable file. Missing pages may be fetched from the (backing) file on demand. (heap) The stack and heap are zero-filled virtual memory: called anonymous because the backing file has no name (i.e., no links: it is destroyed if the process dies). Pages from anonymous segments are initialized to zero, but the process may write to them. If they are evicted from memory the contents must be stored somewhere on disk.

slide-49
SLIDE 49

Virtual memory faults (2)

  • Kernel searches maps for the object mapped at that address. No
  • bject? Then it’s an error, e.g., segmentation fault.
  • Kernel checks intended mode of access for the object (rwx). Access

not allowed? Then it’s an error, e.g., protection fault.

  • 1. Run down the

vm_map for the VAS to find the segment/region containing the faulting address.

  • 2. If we find the segment,

check the protection to see if the access is legal.

  • 3. If the access is legal, identify the

backing object containing the page. vm_map vm_object

slide-50
SLIDE 50

Virtual memory faults (3)

  • Is the missing page (object/offset) in a memory frame somewhere,

but just missing from the page table?

– Index page cache (object/offset hash table) to find out. (The page could be resident if the segment or backing object is shared, and the page is resident in memory on behalf of some other process.)

  • If not, then find a free frame of memory to hold the missing page.
  • Is the missing page in an object on backing storage? Figure out

where: index the inode block map. Fetch page into the frame.

  • Or: is it the first reference to a page in a zero-fill object (e.g., stack or

heap)? Then fill the frame with zeros.

  • So far so good? Install a translation in the page table entry (pte)

mapping the faulted virtual page to its frame.

  • Adjust PC to restart faulted instruction, and return to user mode.
slide-51
SLIDE 51

Caches in Linux

slabinfo - version: 1.1 kmem_cache 59 78 100 2 2 1 ip_fib_hash 10 113 32 1 1 1 ip_conntrack 0 0 384 0 0 1 urb_priv 0 0 64 0 0 1 clip_arp_cache 0 0 128 0 0 1 ip_mrt_cache 0 0 96 0 0 1 tcp_tw_bucket 0 30 128 0 1 1 tcp_bind_bucket 5 113 32 1 1 1 tcp_open_request 0 0 96 0 0 1 inet_peer_cache 0 0 64 0 0 1 ip_dst_cache 23 100 192 5 5 1 arp_cache 2 30 128 1 1 1 blkdev_requests 256 520 96 7 13 1 dnotify cache 0 0 20 0 0 1 file lock cache 2 42 92 1 1 1 fasync cache 1 202 16 1 1 1 uid_cache 4 113 32 1 1 1 skbuff_head_cache 93 96 160 4 4 1 sock 115 126 1280 40 42 1 sigqueue 0 29 132 0 1 1 cdev_cache 156 177 64 3 3 1 bdev_cache 69 118 64 2 2 1 mnt_cache 13 40 96 1 1 1 inode_cache 5561 5580 416 619 620 1 dentry_cache 7599 7620 128 254 254 1 dquot 0 0 128 0 0 1 filp 1249 1280 96 32 32 1 names_cache 0 8 4096 0 8 1 buffer_head 15303 16920 96 422 423 1 mm_struct 47 72 160 2 3 1 vm_area_struct 1954 2183 64 34 37 1 fs_cache 46 59 64 1 1 1 files_cache 46 54 416 6 6 1

The columns are cache name, active objects, total number of objects, object size, number of full or partial pages, total allocated pages, and pages per slab.

slide-52
SLIDE 52

Page/block cache internals

Lookup: HASH(blockID) This is what a software-based cache looks like. Each frame/buffer of memory is described by a meta-object (header). Resident pages/blocks are accessible for lookup in a global hash table. An ordered list of eviction candidates winds through the hash chains. Hash table bucket array bucket lists free/eviction list Policy choices: which pages or blocks to keep in memory? Which to evict from memory to make room for others? How to handle writes?