H a r d w a r e / S o f t w a r e C o - D e s - - PowerPoint PPT Presentation

h a r d w a r e s o f t w a r e c o d e s i g n f o r e f
SMART_READER_LITE
LIVE PREVIEW

H a r d w a r e / S o f t w a r e C o - D e s - - PowerPoint PPT Presentation

H a r d w a r e / S o f t w a r e C o - D e s i g n f o r E f f i c i e n t M i c r o k e r n e l E x e c u t i o n Martjn Dck martjn.decky@huawei.com February 2019 Who Am I Passionate


slide-1
SLIDE 1

H a r d w a r e / S

  • f

t w a r e C

  • D

e s i g n f

  • r

E f f i c i e n t M i c r

  • k

e r n e l E x e c u t i

  • n

Martjn Děcký martjn.decky@huawei.com

February 2019

slide-2
SLIDE 2

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 2

Who Am I

Passionate programmer and operatjng systems enthusiast

With a specifjc inclinatjon towards multjserver microkernels

HelenOS developer since 2004 Research Scientjst from 2006 to 2018

Charles University (Prague), Distributed Systems Research Group

Senior Research Engineer since 2017

Huawei Technologies (Munich), German Research Center, Central Sofuware Instjtute, OS Kernel Lab

slide-3
SLIDE 3

3 Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon

M i c r

  • k

e r n e l M u l t i s e r v e r S y s t e m s a r e b e t t e r t h a n M

  • n
  • l

i t h i c S y s t e m s

3

slide-4
SLIDE 4

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 4

Monolithic OS Design is Flawed

Biggs S., Lee D., Heiser G.: The Jury Is In: Monolithic OS Design Is Flawed: Microkernel-based Designs Improve Security, ACM 9th Asia- Pacifjc Workshop on Systems (APSys), 2018

“While intuitjve, the benefjts of the small TCB have not been quantjfjed to

  • date. We address this by a study of critjcal Linux CVEs, where we examine

whether they would be prevented or mitjgated by a microkernel-based

  • design. We fjnd that almost all exploits are at least mitjgated to less than

critjcal severity, and 40 % completely eliminated by an OS design based

  • n a verifjed microkernel, such as seL4.”
slide-5
SLIDE 5

5 Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon

P r

  • b

l e m S t a t e m e n t

5

slide-6
SLIDE 6

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 6

Problem Statement

Microkernel design ideas go as back as 1969

RC 4000 Multjprogramming System nucleus (Per Brinch Hansen)

Isolatjon of unprivileged processes, inter-process communicatjon, hierarchical control

Even afuer 50 years they are not fully accepted as mainstream

Hardware and sofuware used to be designed independently

Designing CPUs used to be an extremely complicated and costly process Operatjng systems used to be writuen afuer the CPUs were designed Hardware designs used to be rather conservatjve

slide-7
SLIDE 7

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 7

Problem Statement (2)

Mainstream ISAs used to be designed in a rather conservatjve way

Can you name some really revolutjonary ISA features since IBM System/370 Advanced Functjon? Requirements on the new ISAs usually follow the needs of the mainstream operatjng systems running on the past ISAs

No wonder microkernels sufger performance penaltjes compared to monolithic systems

The more fjne-grained the architecture, the more penaltjes it sufgers Let us design the hardware with microkernels in mind!

slide-8
SLIDE 8

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 8

The Vicious Cycle

CPUs do not support microkernels properly

slide-9
SLIDE 9

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 9

The Vicious Cycle

CPUs do not support microkernels properly Microkernels sufger perfromance penaltjes

slide-10
SLIDE 10

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 10

The Vicious Cycle

CPUs do not support microkernels properly Microkernels are not in the mainstream Microkernels sufger perfromance penaltjes

slide-11
SLIDE 11

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 11

The Vicious Cycle

CPUs do not support microkernels properly Microkernels are not in the mainstream Microkernels sufger perfromance penaltjes No requirements on CPUs from microkernels

slide-12
SLIDE 12

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 12

The Vicious Cycle

CPUs do not support microkernels properly Microkernels are not in the mainstream Microkernels sufger perfromance penaltjes No requirements on CPUs from microkernels

slide-13
SLIDE 13

13 Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon

A n y I d e a s ?

slide-14
SLIDE 14

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 14

Communicatjon between Address Spaces

Control and data fmow between subsystems

Monolithic kernel

Functjon calls

Passing arguments in registers and on the stack Passing direct pointers to memory structures

Multjserver microkernel

IPC via microkernel syscalls

Passing arguments in a subset of registers Privilege level switch, address space switch Scheduling (in case of asynchronous IPC) Data copying or memory sharing with page granularity

slide-15
SLIDE 15

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 15

Communicatjon between Address Spaces (2)

Is the kernel round-trip of the IPC necessary?

Suggestjon for synchronous IPC: Extended Jump/Call and Return instructjons that also switch the address space

Communicatjng partjes identjfjed by a “call gate” (capability) containing the target address space and the PC of the IPC handler (implicit for return)

Call gates stored in a TLB-like hardware cache (CLB) CLB populated by the microkernel similarly to TLB-only memory management architecture

Suggestjon for asynchronous IPC: Using CPU cache lines as the bufgers for the messages

Async Jump/Call, Async Return and Async Receive instructjons Using the CPU cache like an extended register stack engine

slide-16
SLIDE 16

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 16

Communicatjon between Address Spaces (3)

Bulk data

Observatjon: Memory sharing is actually quite effjcient for large amounts

  • f data (multjple pages)

Overhead is caused primarily by creatjng and tearing down the shared pages Data needs to be page-aligned

Sub-page granularity and dynamic data structures

Suggestjon: Using CPU cache lines as shared bufgers

Much fjner granularity than pages (typically 64 to 128 bytes) A separate virtual-to-cache mapping mechanism before the standard virtual-to-physical mapping

slide-17
SLIDE 17

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 17

Fast Context Switching

Current microsecond-scale latency hiding mechanisms

Hardware multj-threading

Efgectjve Does not scale beyond a few threads

Operatjng system context switching

Scales for any thread count Too slow (order of 10 µs)

Goal: Finding a sweet spot between the two mechanisms

slide-18
SLIDE 18

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 18

Fast Context Switching (2)

Suggestjon: Hardware cache for contexts

Again, similar mechanism to TLB-only memory management Dedicated instructjons for context store, context restore, context switch, context save, context load

Context data could be potentjally ABI-optjmized

Autonomous mechanism for event-triggered context switch (e.g. external interrupt) Effjcient hardware mechanism for latency hiding

The equivalent of fjne/coarse-grained simultaneous multjthreading

The sofuware scheduler is in charge of settjng the scheduler policy The CPU is in charge of scheduling the contexts based on ALU, cache and other resource availability

slide-19
SLIDE 19

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 19

User Space Interrupt Processing

Extension of the fast context switching mechanism

Effjcient delivery of interrupt events to user space device drivers

Without the routjne microkernel interventjon

An interrupt could be directly handled by a preconfjgured hardware context in user space

A clear path towards moving even the tjmer interrupt handler and the scheduler from kernel space to user space Going back to interrupt-driven handling of peripherals with extreme low latency requirements (instead of polling)

The usual pain point: Level-triggered interrupts

Some coordinatjon with the platgorm interrupt controller is probably needed to automatjcally mask the interrupt source

slide-20
SLIDE 20

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 20

Capabilitjes as First-Class Entjtjes

Capabilitjes as unforgeable object identjfjers

But eventually each access to an object needs to be bound-checked and translated into the (fmat) virtual address space Suggestjon: Embedding the capability reference in pointers

RV128 (128-bit variant of RISC-V) would provide 64 bits for the capability reference and 64 bits for object ofgset

128-bit fmat pointers are probably useless anyway

Besides the (somewhat narrow) use in the microkernel, this could be useful for other purposes

Simplifying the implementatjon of managed languages’ VMs Working with multjple virtual address spaces at once

slide-21
SLIDE 21

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 21

Prior Art

Nordström S., Lindh L., Johansson L., Skoglund T.: Applicatjon Specifjc Real-Time Microkernel in Hardware, 14th IEEE-NPSS Real Time Conference, 2005

Offmoading basic microkernel operatjons (e.g. thread creatjon, context switching) to hardware shown to improve performance by 15 % on average and up to 73 %

This was a coarse-grained approach

Hardware message passing in Intel SCC and Tilera TILE-G64/TILE- Pro64

Asynchronous message passing with tjght sofuware integratjon

slide-22
SLIDE 22

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 22

Prior Art (2)

Hajj I. E,, Merritu A., Zellweger G., Milojicic D., Achermann R., Faraboschi P., Hwu W., Roscoe T., Schwan K.: SpaceJMP: Programming with Multjple Virtual Address Spaces, 21st ACM ASPLOS, 2016

Practjcal programming model for using multjple virtual address spaces on commodity hardware (evaluated on DragonFly BSD and Barrelfjsh)

Useful for data-centric applicatjons for sharing large amounts of memory between processes

Intel IA-32 Task State Segment (TSS)

Hardware-based context switching Historically, it has been used by Linux

The primary reason for removal was not performance, but portability

slide-23
SLIDE 23

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 23

Prior Art (3)

Intel VT-x VM Functjons (VMFUNC)

Effjcient cross-VM functjon calls

Switching the EPT and passing register arguments Current implementatjon limited to 512 entry points Practjcally usable even for very fjne-grained virtualizatjon with the granularity of individual functjons

Liu Y., Zhou T., Chen K., Chen H., Xia Y.: Thwartjng Memory Disclosure with Effjcient Hypervisor-enforced Intra-domain Isolatjon, 22nd ACM SIGSAC Conference on Computer and Communicatjons Security, 2015

– “The cost of a VMFUNC is similar with a syscall” – “… hypervisor-level protectjon at the cost of system calls”

SkyBridge paper to appear at EuroSys 2019

slide-24
SLIDE 24

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 24

Prior Art (4)

Woodrufg J., Watson R. N. M., Chisnall D., Moore S., Anderson J., Davis B., Laurie B., Neumann P. G., Norton R., Roe. M.: The CHERI capability model: Revisitjng RISC in the an age of risk, 41st ACM Annual Internatjonal Symposium on Computer Architecture, 2014

Hardware-based capability model for byte-granularity memory protectjon Extension of the 64-bit MIPS ISA

Evaluated on an extended MIPS R4000 FPGA sofu-core 32 capability registers (256 bits)

Limitatjon: Infmexible design mostly due to the tjght backward compatjbility with a 64-bit ISA

Intel MPX

Several design and implementatjon issues, deemed not productjon-ready

slide-25
SLIDE 25

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 25

Summary

Traditjonally, hardware has not been designed to accommodate the requirements of microkernel multjserver operatjng systems

Microkernels thus sufger performance penaltjes

This prevented them from replacing monolithic operatjng systems and closed the vicious cycle

Hardware design is hopefully becoming more accessible and democratjc

E.g. RISC-V

Co-designing the hardware and sofuware might help us gain the benefjts

  • f the microkernel multjserver design with no performance penaltjes

However, it requires some out-of-the-box thinking

slide-26
SLIDE 26

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 26

Acknowledgements

OS Kernel Lab at Huawei Technologies

Javier Picorel Haibo Chen

slide-27
SLIDE 27

Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon 27

Huawei Dresden R&D Lab

Focusing on microkernel research, design and development

Basic research Applied research Prototype development Collaboratjon with academia and other technology companies

Looking for senior operatjng system researchers, designers, developers and experts

Previous microkernel experience is a big plus “A startup within a large company” Shaping the future product portgolio of Huawei

Including hardware/sofuware co-design via HiSilicon

slide-28
SLIDE 28

28 Martjn Děcký, FOSDEM, February 3rd 2019 Hardware/Sofuware Co-Design for Effjcient Microkernel Executjon

Q & A

slide-29
SLIDE 29

T h a n k Y

  • u

!