L e s s o n s L e a r n e d f r o m P o r t i - - PowerPoint PPT Presentation

l e s s o n s l e a r n e d f r o m p o r t i n g h e l e
SMART_READER_LITE
LIVE PREVIEW

L e s s o n s L e a r n e d f r o m P o r t i - - PowerPoint PPT Presentation

L e s s o n s L e a r n e d f r o m P o r t i n g H e l e n O S t o R I S C - V Martjn Dck martjn@decky.cz February 2019 Who Am I Passionate programmer and operatjng systems enthusiast With a specifjc


slide-1
SLIDE 1

L e s s

  • n

s L e a r n e d f r

  • m

P

  • r

t i n g H e l e n O S t

  • R

I S C

  • V

Martjn Děcký martjn@decky.cz

February 2019

slide-2
SLIDE 2

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 2

Who Am I

Passionate programmer and operatjng systems enthusiast

With a specifjc inclinatjon towards multjserver microkernels

HelenOS developer since 2004 Research Scientjst from 2006 to 2018

Charles University (Prague), Distributed Systems Research Group

Senior Research Engineer since 2017

Huawei Technologies (Munich), German Research Center, Central Sofuware Instjtute, OS Kernel Lab

slide-3
SLIDE 3
slide-4
SLIDE 4

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 4

HelenOS in a Nutshell

  • pen source general-purpose multjplatgorm

microkernel multjserver operatjng system designed and implemented from scratch

slide-5
SLIDE 5

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 5

HelenOS in a Nutshell

  • pen source general-purpose multjplatgorm

microkernel multjserver operatjng system designed and implemented from scratch

Custom microkernel Custom user space htup://www.helenos.org

slide-6
SLIDE 6

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 6

HelenOS in a Nutshell

  • pen source general-purpose multjplatgorm

microkernel multjserver operatjng system designed and implemented from scratch

3-clause BSD permissive license htups://github.com/HelenOS

slide-7
SLIDE 7

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 7

HelenOS in a Nutshell

  • pen source general-purpose multjplatgorm

microkernel multjserver operatjng system designed and implemented from scratch

Breath-fjrst rather than depth-fjrst Potentjally targetjng server, desktop and embedded

slide-8
SLIDE 8

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 8

HelenOS in a Nutshell

  • pen source general-purpose multjplatgorm

microkernel multjserver operatjng system designed and implemented from scratch

IA-32 (x86), AMD64 (x86-64), IA-64 (Itaninum), ARM, MIPS, PowerPC, SPARCv9 (UltraSPARC)

slide-9
SLIDE 9

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 9

HelenOS in a Nutshell

  • pen source general-purpose multjplatgorm

microkernel multjserver operatjng system designed and implemented from scratch

Fine-grained modular component architecture No monolithic components even in user space

slide-10
SLIDE 10

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 10

HelenOS in a Nutshell

  • pen source general-purpose multjplatgorm

microkernel multjserver operatjng system designed and implemented from scratch

Architecture based on a set of guiding design principles Asynchronous bi-directjonal IPC with rich semantjcs

slide-11
SLIDE 11
slide-12
SLIDE 12

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 12

Motjvatjon: Sofuware Dependability

How HelenOS tries to achieve dependability?

Microkernel multjserver architecture based on design principles

Fundamental fault isolatjon (limitjng the “blast radius”) Explicit mapping between design and implementatjon

Clean, manageable, understandable and auditable source code

“Code is writuen once, but read many tjmes” Ratjo of comments: 38 %

“Extremely well-commented source code” (Open Hub)

Work in progress: Formal verifjcatjon

slide-13
SLIDE 13

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 13

Motjvatjon: Sofuware Dependability

High-quality architecture High-quality implementatjon Verifjcatjon

  • f correctness

Development process

slide-14
SLIDE 14

monolithic OS HelenOS

slide-15
SLIDE 15

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 15

HelenOS Microkernel Functjonal Blocks

architecture independent shared architecture dependent architecture dependent

bootstrap routines CPU mgmt atomics & barriers I/O mgmt platform memory mgmt platform drivers debugging support context switching interrupt handling platform library routines shared platform drivers shared debugging support hierarchical page table support global page hash table support

hardware abstraction layer

kernel unit tests memory backends memory zones mgmt frame allocator slab allocator address space mgmt memory reservation spinlocks wait queues work queues interrupt & syscall dispatch thread scheduler thread & task mgmt kernel lifecycle mgmt lists, trees, bitmaps concurrent hash table generic resource allocator ELF loader string routines misc routines kernel debug console IPC kernel log hardware resource mgmt system information cycle & time mgmt tracing support read- copy- update capabilities cache coherency synchro- nization interface

slide-16
SLIDE 16

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 16

HelenOS Logical Architecture

device manager device drivers client session vterm bdsh vfs fjle system drivers FAT exFAT ext4 ISO 9660 UDF MINIX FS TMPFS Location FS kernel naming service loader task monitor klog location service logger init transport layer protocols tcp udp link layer protocols loopip ethip slip inetsrv networking management dnsrsrv dhcp nconfsrv human interface clipboard audio

  • utput

input console compositor remote console remote framebufger

slide-17
SLIDE 17

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 17

HelenOS RISC-V Port Status

January 2016

Infrastructure, boot loader, initjal virtual memory setup, kernel hand-ofg

Privileged ISA Specifjcatjon version 1.7, toolchain support not upstreamed yet Targetjng Spike 18 hours net development tjme

Initjal experience

Many things besides the ISA itself were not nicely documented (e.g. ABI, HTIF) and had to be reverse-engineered from Spike Even some ISA details were sketchy (memory consistency model) Generally speaking, the ISA itself looked nice (except the compressed page protectjon fjeld)

slide-18
SLIDE 18

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 18

HelenOS RISC-V Port Status (2)

August 2017

Basic kernel functjonality (interrupt/exceptjon handling, context switching, atomics, basic I/O)

Privileged ISA Specifjcatjon version 1.10

Some minor improvements (e.g. more standard page protectjon bits)

Stjll targetjng Spike

Observatjon: The HTIF input device has a horrible design

– No interrupts – Polling requests are bufgered

Stjll no decent “reference platgorm”

24 hours net development tjme

slide-19
SLIDE 19
slide-20
SLIDE 20

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 20

HelenOS RISC-V Port Status (3)

January 2019

Towards user space support

Switching to QEMU virt target

Looks more reasonable than Spike CLINT, PLIC, NS16550 UART, VirtIO

Toolchain support upstream 8 hours net development tjme

slide-21
SLIDE 21

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 21

Lessons Learned

Suprisingly litule interest in portjng HelenOS to RISC-V

Compared to previous portjng efgorts to ARM, SPARCv9, SPARCv8, etc. GSoC, master thesis, team sofuware project to no avail Possible reasons

Lack of feature-rich reference platgorm Lack of easily available development board

A Raspberry Pi (USB, ethernet, HDMI, sound), but with a RISC-V CPU supportjng the Supervisor mode

Despite RISC-V being a new major ISA, there is surprisingly litule input from operatjng system research

slide-22
SLIDE 22

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 22

Problem Statement

Microkernel design ideas go as back as 1969

RC 4000 Multjprogramming System nucleus (Per Brinch Hansen)

Isolatjon of unprivileged processes, inter-process communicatjon, hierarchical control

There are obvious benefjts of the design for safety, security, dependability, formal verifjcatjon, etc.

Hardware and sofuware used to be designed independently

Designing CPUs used to be an extremely complicated and costly process Operatjng systems used to be writuen afuer the CPUs were designed Hardware designs used to be rather conservatjve

slide-23
SLIDE 23

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 23

Monolithic OS Design is Flawed

Biggs S., Lee D., Heiser G.: The Jury Is In: Monolithic OS Design Is Flawed: Microkernel-based Designs Improve Security, ACM 9th Asia- Pacifjc Workshop on Systems (APSys), 2018

“While intuitjve, the benefjts of the small TCB have not been quantjfjed to

  • date. We address this by a study of critjcal Linux CVEs, where we examine

whether they would be prevented or mitjgated by a microkernel-based

  • design. We fjnd that almost all exploits are at least mitjgated to less than

critjcal severity, and 40 % completely eliminated by an OS design based

  • n a verifjed microkernel, such as seL4.”
slide-24
SLIDE 24

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 24

HelenOS IPC Example

client VFS tmpfs

naming service

naming service

slide-25
SLIDE 25

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 25

Where RISC-V Could Really Help?

Mainstream ISAs used to be designed in a rather conservatjve way

Can you name some really revolutjonary ISA features since IBM System/370 Advanced Functjon? Requirements on the new ISAs usually follow the needs of the mainstream operatjng systems running on the past ISAs

No wonder microkernels sufger performance penaltjes compared to monolithic systems

The more fjne-grained the architecture, the more penaltjes it sufgers Let us design the hardware with microkernels in mind!

slide-26
SLIDE 26

26 Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V

A n y I d e a s ?

slide-27
SLIDE 27

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 27

Communicatjon between Address Spaces

Control and data fmow between subsystems

Monolithic kernel

Functjon calls

Passing arguments in registers and on the stack Passing direct pointers to memory structures

Multjserver microkernel

IPC via microkernel syscalls

Passing arguments in a subset of registers Privilege level switch, address space switch Scheduling (in case of asynchronous IPC) Data copying or memory sharing with page granularity

slide-28
SLIDE 28

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 28

Communicatjon between Address Spaces (2)

Is the kernel round-trip of the IPC necessary?

Suggestjon for synchronous IPC: Extended Jump/Call and Return instructjons that also switch the address space

Communicatjng partjes identjfjed by a “call gate” (capability) containing the target address space and the PC of the IPC handler (implicit for return)

Call gates stored in a TLB-like hardware cache (CLB) CLB populated by the microkernel similarly to TLB-only memory management architecture

Suggestjon for asynchronous IPC: Using CPU cache lines as the bufgers for the messages

Async Jump/Call, Async Return and Async Receive instructjons Using the CPU cache like an extended register stack engine

slide-29
SLIDE 29

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 29

Communicatjon between Address Spaces (3)

Bulk data

Observatjon: Memory sharing is actually quite effjcient for large amounts

  • f data (multjple pages)

Overhead is caused primarily by creatjng and tearing down the shared pages Data needs to be page-aligned

Sub-page granularity and dynamic data structures

Suggestjon: Using CPU cache lines as shared bufgers

Much fjner granularity than pages (typically 64 to 128 bytes) A separate virtual-to-cache mapping mechanism before the standard virtual-to-physical mapping

slide-30
SLIDE 30

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 30

Fast Context Switching

Current microsecond-scale latency hiding mechanisms

Hardware multj-threading

Efgectjve Does not scale beyond a few threads

Operatjng system context switching

Scales for any thread count Too slow (order of 10 µs)

Goal: Finding a sweet spot between the two mechanisms

slide-31
SLIDE 31

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 31

Fast Context Switching (2)

Suggestjon: Hardware cache for contexts

Again, similar mechanism to TLB-only memory management Dedicated instructjons for context store, context restore, context switch, context save, context load

Context data could be potentjally ABI-optjmized

Autonomous mechanism for event-triggered context switch (e.g. external interrupt) Effjcient hardware mechanism for latency hiding

The equivalent of fjne/coarse-grained simultaneous multjthreading

The sofuware scheduler is in charge of settjng the scheduler policy The CPU is in charge of scheduling the contexts based on ALU, cache and other resource availability

slide-32
SLIDE 32

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 32

User Space Interrupt Processing

Extension of the fast context switching mechanism

Effjcient delivery of interrupt events to user space device drivers

Without the routjne microkernel interventjon

An interrupt could be directly handled by a preconfjgured hardware context in user space

A clear path towards moving even the tjmer interrupt handler and the scheduler from kernel space to user space Going back to interrupt-driven handling of peripherals with extreme low latency requirements (instead of polling)

The usual pain point: Level-triggered interrupts

Some coordinatjon with the platgorm interrupt controller is probably needed to automatjcally mask the interrupt source

slide-33
SLIDE 33

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 33

Capabilitjes as First-Class Entjtjes

Capabilitjes as unforgeable object identjfjers

But eventually each access to an object needs to be bound-checked and translated into the (fmat) virtual address space Suggestjon: Embedding the capability reference in pointers

RV128 could provide 64 bits for the capability reference and 64 bits for object

  • fgset

128-bit fmat pointers are probably useless anyway

Besides the (somewhat narrow) use in the microkernel, this could be useful for other purposes

Simplifying the implementatjon of managed languages’ VMs Working with multjple virtual address spaces at once

slide-34
SLIDE 34

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 34

Prior Art

Nordström S., Lindh L., Johansson L., Skoglund T.: Applicatjon Specifjc Real-Time Microkernel in Hardware, 14th IEEE-NPSS Real Time Conference, 2005

Offmoading basic microkernel operatjons (e.g. thread creatjon, context switching) to hardware shown to improve performance by 15 % on average and up to 73 %

This was a coarse-grained approach

Hardware message passing in Intel SCC and Tilera TILE-G64/TILE- Pro64

Asynchronous message passing with tjght sofuware integratjon

slide-35
SLIDE 35

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 35

Prior Art (2)

Hajj I. E,, Merritu A., Zellweger G., Milojicic D., Achermann R., Faraboschi P., Hwu W., Roscoe T., Schwan K.: SpaceJMP: Programming with Multjple Virtual Address Spaces, 21st ACM ASPLOS, 2016

Practjcal programming model for using multjple virtual address spaces on commodity hardware (evaluated on DragonFly BSD and Barrelfjsh)

Useful for data-centric applicatjons for sharing large amounts of memory between processes

Intel IA-32 Task State Segment (TSS)

Hardware-based context switching Historically, it has been used by Linux

The primary reason for removal was not performance, but portability

slide-36
SLIDE 36

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 36

Prior Art (3)

Intel VT-x VM Functjons (VMFUNC)

Effjcient cross-VM functjon calls

Switching the EPT and passing register arguments Current implementatjon limited to 512 entry points Practjcally usable even for very fjne-grained virtualizatjon with the granularity of individual functjons

Liu Y., Zhou T., Chen K., Chen H., Xia Y.: Thwartjng Memory Disclosure with Effjcient Hypervisor-enforced Intra-domain Isolatjon, 22nd ACM SIGSAC Conference on Computer and Communicatjons Security, 2015

– “The cost of a VMFUNC is similar with a syscall” – “… hypervisor-level protectjon at the cost of system calls”

SkyBridge paper to appear at EuroSys 2019

slide-37
SLIDE 37

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 37

Prior Art (4)

Woodrufg J., Watson R. N. M., Chisnall D., Moore S., Anderson J., Davis B., Laurie B., Neumann P. G., Norton R., Roe. M.: The CHERI capability model: Revisitjng RISC in the an age of risk, 41st ACM Annual Internatjonal Symposium on Computer Architecture, 2014

Hardware-based capability model for byte-granularity memory protectjon Extension of the 64-bit MIPS ISA

Evaluated on an extended MIPS R4000 FPGA sofu-core 32 capability registers (256 bits)

Limitatjon: Infmexible design mostly due to the tjght backward compatjbility with a 64-bit ISA

Intel MPX

Several design and implementatjon issues, deemed not productjon-ready

slide-38
SLIDE 38

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 38

Summary

Traditjonally, hardware has not been designed to accommodate the requirements of microkernel multjserver operatjng systems

Microkernels thus sufger performance penaltjes

This prevented them from replacing monolithic operatjng systems and closed the vicious cycle

Co-designing the hardware and sofuware might help us gain the benefjts of the microkernel multjserver design with no performance penaltjes

However, it requires some out-of-the-box thinking

RISC-V has “once in the lifetjme” opportunity to reshape the entjre computer industry

Finally moving from unsafe and insecure monolithic systems to microkernels

slide-39
SLIDE 39

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 39

Acknowledgements

OS Kernel Lab at Huawei Technologies

Javier Picorel Haibo Chen

slide-40
SLIDE 40

Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V 40

Huawei Dresden R&D Lab

Focusing on microkernel research, design and development

Basic research Applied research Prototype development Collaboratjon with academia and other technology companies

Looking for senior operatjng system researchers, designers, developers and experts

Previous microkernel experience is a big plus “A startup within a large company” Shaping the future product portgolio of Huawei

Including hardware/sofuware co-design via HiSilicon

slide-41
SLIDE 41

41 Martjn Děcký, FOSDEM, February 2nd 2019 Lessons Learned from Portjng HelenOS to RISC-V

Q & A

slide-42
SLIDE 42

T h a n k Y

  • u

!