Jakub Jermář
jakub.jermar@kernkonzept.comArchitectures, Microkernels, IPC, Capabilities Architectures, Microkernels, IPC, Capabilities
Architectures, Architectures, Microkernels, IPC, Microkernels, - - PowerPoint PPT Presentation
Architectures, Architectures, Microkernels, IPC, Microkernels, IPC, Capabilities Capabilities http://d3s.mff.cuni.cz/aosy http://d3s.mff.cuni.cz Jakub Jerm jakub.jermar@kernkonzept.com Agenda Agenda Kernel architectures Microkernels
Jakub Jermář
jakub.jermar@kernkonzept.comArchitectures, Microkernels, IPC, Capabilities Architectures, Microkernels, IPC, Capabilities
Agenda Agenda
Kernel architectures Microkernels IPC Capabilitjes
Recall: Common OS Taxonomy Recall: Common OS Taxonomy
Special-purpose operatjng systems Real-tjme operatjng systems Hypervisors (type 1) ... General-purpose operatjng systems Monolithic kernel Single-server microkernel Multjserver microkernel Hybrid kernel (?)Monolithic Kernel Monolithic Kernel
hardware monolithic kernel applicatjon applicatjon applicatjon privileged mode unprivileged mode memory mgmt scheduler IPC device drivers fjle system drivers user mgmt network stack ...Some Obvious Issues Some Obvious Issues
Security Applicatjons trust all kernel components Kernel components trust all other kernel components Reliability Kernel components are a single point of failure Availability Kernel components cannot be updated independently Justjfjability Who says fjle systems, networking, device drivers, etc. belong to the kernel?Some Obvious Issues (2) Some Obvious Issues (2)
Extensibility
How to extend the system without modifying the kernel Too many communicatjon mechanisms Unix: pipes, fjles, shared memory, sockets, signals, System V IPC, System V shared memory, System V semaphores… Kernel has many built-in policiesSofuware design principles
Interfaces between kernel components are usually implicit, not well-defjnedSingle-server Microkernel Single-server Microkernel
hardware microkernel applicatjon applicatjon applicatjon privileged mode unprivileged mode memory mgmt scheduler IPC system server device drivers fjle system drivers user mgmt network stack ...Multjserver Microkernel Multjserver Microkernel
hardware microkernel applicatjon applicatjon applicatjon privileged mode unprivileged mode memory mgmt scheduler IPC naming server locatjon server device driver server device driver server device driver server fjle system driver server fjle system driver server device multjplexer fjle system multjplexer network stack security server ...Examples Examples
Monolithic kernel Linux, Solaris (UTS), Windows, FreeBSD, NetBSD, OpenBSD, OpenVMS, MS-DOS, RISC OS Microkernel (the microkernel on its own) CMU Mach, GNU Mach, L4::Pistachio, Fiasco.OC, seL4 Single-server microkernel CMU Mach (with 4.3BSD server), MkLinux, L4Linux Multjserver microkernel L4Re, HelenOS, MINIX 3, Genode, GNU/HurdMultjserver Microkernel (reprise) Multjserver Microkernel (reprise)
hardware microkernel applicatjon applicatjon applicatjon privileged mode unprivileged mode memory mgmt scheduler IPC naming server locatjon server device driver server device driver server device driver server fjle system driver server fjle system driver server device multjplexer fjle system multjplexer network stack security server ...Hypervisor (Type 1) Hypervisor (Type 1)
hardware hypervisor hyper-privileged mode memory mgmt scheduler comm privileged modeCommon Cloud Deployment Common Cloud Deployment
hardware hypervisor hyper-privileged mode memory mgmt scheduler comm privileged modeUnikernel Unikernel
hardware hypervisor hyper-privileged mode memory mgmt scheduler comm privileged mode unikernel kernel component app component unikernel kernel component app component unikernel kernel component app componentUnikernel (2) Unikernel (2)
Library operatjng system
Approach to building operatjng systemsUnikernel
Architecture Binary artjfactUnikernel (3) Unikernel (3)
Library operatjng system
Payload (applicatjon) merged with the kernel Kernel component acts as a library providing access to the hardware, threading, fjle systems, etc. Only necessary functjonality Mostly statjc (single image), but there are dynamic variants Code runs in privileged (less privileged) mode and single address space No mode switches, address space switches Syscalls can be replaced by functjon calls Isolatjon/security provided by the underlying hypervisor (more privileged mode)Unikernel (4) Unikernel (4)
Madhavapeddy, A., Scotu, D., J.: Unikernels: Rise of the Virtual Library Operatjng System, ACM Queue, 2013
MirageOS University of Cambridge, Docker Clean-slate components writuen in OCaml Used in Docker for Mac, VPNKitUnikernel (5) Unikernel (5)
Porter, D., E., et al.: Rethinking the library OS from the top down, ASPLOS, 2011
Drawbridge Microsofu Research (2011– ?) Librarifjed Windows Used in MSSQL Server for Linux (2016)Kantee, A.: The Rise and Fall of the Operatjng System, ;login:, October 2015, Vol. 40, No. 5
Rumpkernel Librarifjed NetBSD Popular source of components for any kernels (NetBSD, rumprun, Hurd, Genode, …)Future Hardware Predictjons Future Hardware Predictjons
More of Complex interconnects & cache hierarchies Cache-coherency protocols even more expensive Diversity Difgerent cores together → same optjmizatjons won’t work anymore Heterogeneity Multjple ISAs → can’t have a single-image OS Less of / lack of Cache coherency Shared memoryOptjons for general purpose OS’s Optjons for general purpose OS’s
Resign
Make it easy to build specialized OS’s UnikernelsRedesign
Atuack the problem from difgerent angle MultjkernelsImplicit Message Passing in Hardware Implicit Message Passing in Hardware
Memory Memory Shared data L2 Cache CPU CPU CPU CPU L1 Cache L1 Cache L2 Cache L1 Cache L1 Cache write L2 Cache CPU CPU CPU CPU L1 Cache L1 Cache L2 Cache L1 Cache L1 Cache write readMultjkernel Paradigm Shifu Multjkernel Paradigm Shifu
Inside the OS layer
Do not assume coherent shared-memory SMP If available, use to optjmize message passing No implicit inter-core state sharing Simple, single-threaded, event-driven code Explicit inter-core communicatjon via message passing Global state replica maintained by distributed algorithmsMultjkernel Multjkernel
CPU kernel applicatjon privileged mode unprivileged mode server server applicatjon CPU kernel server server applicatjon CPU kernel server server applicatjon applicatjon State replica State replica State replicaMultjkernel (2) Multjkernel (2)
Kernel-userspace boundary not characteristjc
Baumann, A., et al.: The Multjkernel: A new OS architecture for scalable multjcore systems, SOSP ‘09
Barrelfjsh ETH Zürich, Microsofu ResearchInter-Process Communicatjon Inter-Process Communicatjon
Sharing data between processes (tasks)
Crossing the process isolatjon in a managed and predictable way Technically, any means of sharing data can be considered IPC (e.g. fjles, networking, middleware) In monolithic systems, this usually works without using a dedicated IPC mechanism Crucial for microkernel systems In microkernel systems, even fjles and networking cannot be implemented without an IPC mechanismClassical IPC Classical IPC
POSIX signals Anonymous pipes Named pipes Sockets POSIX shared memory System V shared memory, IPC, semaphores
Capabilitjes Capabilitjes
Capability
Object identjfying an OS resource Logical objects (open fjles, connectjons), typed memory areas (physical memory regions) Capability reference Local user space identjfjcatjon of a capability (fjle handles, virtual memory regions) Operatjons with capabilitjes Invoking a method with a capability reference Permissible methods defjned by the capability itself Give a capability to someone else Revoke a previously given capabilityTrivial Capability Example Trivial Capability Example
kernel space user space read(0, ...); 1 2 3 fjle descriptor table (capabilitjes) fjle descriptor (capability reference) vfs_file_tTrivial Capability Example (2) Trivial Capability Example (2)
kernel space user space struct msghdr msg; struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); // ... memmove(CMSG_DATA(cmsg), &fd, sizeof(fd)); sendmsg(socket, &msg, 0); 1 2 3 vfs_file_t 1 2 3Trivial Capability Example (2) Trivial Capability Example (2)
kernel space user space struct msghdr msg; struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); // ... memmove(CMSG_DATA(cmsg), &fd, sizeof(fd)); sendmsg(socket, &msg, 0); 1 2 3 vfs_file_t struct msghdr msg; struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); // ... recvmsg(socket, &msg, 0); int fd; memmove(&fd, CMSG_DATA(cmsg), sizeof(fd)); 1 2 3 4L4 IPC Before Capabilitjes L4 IPC Before Capabilitjes
L4::Pistachio
L 4 _ M s g _ t m s g ; L 4 _ M s g C l e a r ( & m s g ) ; L 4 _ S e t _ M s g L a b e l ( & m s g , L A B E L ) ; / / s e t u s e rIssues with Global IDs Issues with Global IDs
Prevent unauthorized clients Global ID can be guessed, even if offjcially unknown Example: MINIX 3 communicatjon control Ordinary user processes allowed to communicate only with POSIX servers Services and driver use policy confjgured in a fjle Example: L4 v2 Chiefs and Clans Threads can communicate with all threads in their own clan Inter-clan communicatjon must go through the chief threads Permission checks Failed checks can stjll DoS the server Decide who can do what Diffjcult to interpose The global ID identjfjes the communicatjon partjesCapabilitjes Trump Global IDs Capabilitjes Trump Global IDs
Prevent unauthorized clients
Only authorized clients have the capabilityPermission checks
Possession of the capability is the authorizatjon to access the resource Can have difgerent capabilitjes for difgerent access modes to the same resourceEasy to interpose
All names are local Communicatjng partjes don’t know each otherL4 IPC with capabilitjes L4 IPC with capabilitjes
Fiasco.OC
l 4 _ m s g _ r e g s _ t * m r = l 4 _ u t c b _ m r ( ) ; m rFiasco.OC IPC Fiasco.OC IPC
l4_msgtag_t l4_ipc(l4_cap_idx_t dest, l4_utcb_t *utcb, l4_umword_t fmags, l4_umword_t slabel, l4_msgtag_t tag, l4_umword_t *rlabel, l4_tjmeout_t tjmeout); SEND – Send to the specifjed destjnatjon RECV – Receive from the specifjed destjnatjon CALL (SEND | RECV) – Send, create reply capability and receive WAIT (OPEN_WAIT | RECV) – Receive from any possible sender SEND_AND_WAIT (SEND | OPEN_WAIT | RECV) REPLY | SEND – Send to the reply capability REPLY | SEND | RECV – Send to the reply capability and receive REPLY_AND_WAIT (REPLY | SEND | OPEN_WAIT | RECV)Fiasco.OC Client/Server IPC Example Fiasco.OC Client/Server IPC Example
l 4 _ m s g _ r e g s _ t * m r = l 4 _ u t c b _ m r ( ) ; i n t a = 1 ; i n t b = 1 ; fFiasco.OC IPC (2) Fiasco.OC IPC (2)
l4_msgtag(label, words, items, fmags)
Label User-defjned label, e.g. protocol number, error code Words Number of untyped words stored in the UTCB Items Number of typed items stored in the UTCB Capabilitjes, mappings FlagsFiasco.OC IPC (3) Fiasco.OC IPC (3)
l4_umword_t slabel, *rlabel
Send label User-defjned label copied to the recipient Used to hold sender thread ID before capabilitjes Mostly zero these days Receive label User-defjned label copied from the sender Usually zero Bound IPC Gates and atuached IRQ objects modify the label Can be used e.g. to store a pointer to the server objectIPC Marshalling IPC Marshalling
By hand Interface Defjnitjon Language
IDL compiler generates client and server stubs from the interface descriptjon in IDL Overkill for microkernels Need just one language, one architecture Advanced constructs not used in microkernels IDL compiler ofuen bigger than the microkernelIPC Marshalling IPC Marshalling
Stream-based IPC
t e m p l a t e < t y p e n a m e T > I p c _ c l i e n t &C++11 IDL (parameter packs, ...)
s t r u c t FL4Re Client/Server RPC Example L4Re Client/Server RPC Example
L 4 : : C a p < FFiasco.OC Object Model Fiasco.OC Object Model
Kernel objects
L4::Thread L4::Task L4::Ipc_gate Object for implementjng userspace objects L4::Irq L4::Semaphore L4::Scheduler L4::Factory Creates new kernel objects subject to factory quota L4::VconFiasco.OC Object Model (2) Fiasco.OC Object Model (2)
Capabilitjes
Typed by kernel/user object Capability selectors / slots allocated in userspace Like in seL4 Unlike in HelenOS, Mach, fjle descriptors Mapped to kernel object upon object creatjon Can be sent via IPC as a typed item Can be mapped to a task via its capabilitySyscall
Invocatjon of capability via IPCNew Object Creatjon in L4Re / Fiasco.OC New Object Creatjon in L4Re / Fiasco.OC
kernel space user space L4::Factory 1 2 3 Task’s object space 4 factory 1New Object Creatjon in L4Re / Fiasco.OC New Object Creatjon in L4Re / Fiasco.OC
a u tNew Object Creatjon in L4Re / Fiasco.OC New Object Creatjon in L4Re / Fiasco.OC
a u tNew Object Creatjon in L4Re / Fiasco.OC New Object Creatjon in L4Re / Fiasco.OC
a u tReferences References