The impact of Meltre and Specdown on microkernel systems Matthias - - PowerPoint PPT Presentation

the impact of meltre and specdown on microkernel systems
SMART_READER_LITE
LIVE PREVIEW

The impact of Meltre and Specdown on microkernel systems Matthias - - PowerPoint PPT Presentation

The impact of Meltre and Specdown on microkernel systems Matthias Lange, Kernkonzept GmbH, FOSDEM 2019 We need to talk about Meltre and Specdown. Conf call with customer, early 2018 The impact of Meltdown and Spectre on the L4Re


slide-1
SLIDE 1

The impact of Meltre and Specdown on microkernel systems

Matthias Lange, Kernkonzept GmbH, FOSDEM 2019

slide-2
SLIDE 2

–Conf call with customer, early 2018

“We need to talk about Meltre and Specdown.”

slide-3
SLIDE 3

The impact of Meltdown and Spectre on the L4Re microkernel system

slide-4
SLIDE 4

Questions

  • Where we prepared?
  • Did microkernel design principles protect or help us?
  • What’s the impact of implemented mitigations?
slide-5
SLIDE 5

Questions - Spoiler

  • Where we prepared?
  • Did microkernel design principles protected or helped

us?

  • What’s the impact of implemented mitigations?

No A little bit

😦

slide-6
SLIDE 6

Meltdown & Spectre

Set of vulnerabilities in modern CPUs

slide-7
SLIDE 7

Meltdown

slide-8
SLIDE 8

Classic virtual address space layout

User

4 GB 3 GB

Kernel

slide-9
SLIDE 9

Classic virtual address space layout

User

4 GB 3 GB

Kernel

1:1

slide-10
SLIDE 10

L4Re’s virtual address space layout

  • Fiasco reserves fixed amount of memory for itself
  • Not all physical memory is mapped in the kernel
  • Uses big pages for mapping
  • Mapping may include user memory
slide-11
SLIDE 11

L4Re’s virtual address space layout

User

4 GB 3 GB

Kernel

1:1

slide-12
SLIDE 12

Solution: Kernel address space

  • Move kernel into its own address space
  • Fiasco uses a CPU local address space
  • User address space only maps absolutely necessary

parts

  • GDT, TSS, entry / exit stack, UTCBs
slide-13
SLIDE 13

Benchmarks - PTI

slide-14
SLIDE 14

Benchmarks - Meta

  • Baseline
  • Fiasco GitHub commit 566cc120, January 1st, 2018
  • Head
  • Fiasco GitHub commit 591c8c0b, January 7th, 2019
  • Compiler: kernel clang 6, userland gcc 7.3
  • Core i7-5700EQ, 2.60GHz
  • Contact me if interested in raw data
slide-15
SLIDE 15

Benchmarks - Scenario 1

L4Linux iperf3 L4Linux iperf3 L4Re Microkernel

slide-16
SLIDE 16

Benchmarks - Scenario 2

L4Linux iperf3 L4Re Microkernel L4Linux iperf3 virtio p2p link

slide-17
SLIDE 17

Micro benchmarks - pingpong, PTI

1000 2000 3000 4000 IPC inter AS Context switch Thread switch (intra)

963 2.586 3.371 422 1.759 1.561

Baseline 2018 PTI

slide-18
SLIDE 18

Benchmarks - Scenario 1, PTI

2,5 5 7,5 10 iperf3

9,27Gbit/s 9,37Gbit/s

Baseline 2018 PTI

slide-19
SLIDE 19

Benchmarks - Scenario 2, PTI

1,5 3 4,5 6 iperf3

3,17Gbit/s 5,14Gbit/s

Baseline 2018 PTI

slide-20
SLIDE 20

Spectre

slide-21
SLIDE 21

Spectre

  • Indirect branch prediction speculatively access data

causing side effects

slide-22
SLIDE 22

Spectre NG

  • Speculative access to FPU state while current context is

not the owner

  • Fiasco uses lazy FPU switching
slide-23
SLIDE 23

Spectre NG - Mitigation

  • Fiasco now supports eager switching on x86
  • Does this incur any performance loss?
slide-24
SLIDE 24

Benchmarks - Eager FPU switching

slide-25
SLIDE 25

Micro benchmarks - pingpong, PTI, eager FPU

1000 2000 3000 4000 IPC inter AS Context switch Thread switch (intra)

1.149 2.918 3.729 963 2.586 3.371 422 1.759 1.561

Baseline 2018 PTI PTI, eager FPU

slide-26
SLIDE 26

Benchmarks - Scenario 1, PTI, eager FPU

2,5 5 7,5 10 iperf3

9Gbit/s 9,27Gbit/s 9,37Gbit/s

Baseline 2018 PTI PTI, eager FPU

slide-27
SLIDE 27

Benchmarks - Scenario 2, PTI, eager FPU

1,5 3 4,5 6 iperf3

3,12Gbit/s 3,17Gbit/s 5,14Gbit/s

Baseline 2018 PTI PTI, eager FPU

slide-28
SLIDE 28

Spectre continued

  • Most variants do not work across process boundaries
  • Usually code execution required
slide-29
SLIDE 29

Spectre continued - Mitigations

  • Fiasco mitigations
  • Indirect branch prediction barrier at kernel entry
  • Full prediction barrier at context switch
  • (microcode loading functionality)
slide-30
SLIDE 30

Benchmarks - IBRS

😦

slide-31
SLIDE 31

Micro benchmarks - pingpong, IBRS

4500 9000 13500 18000 IPC inter AS Context switch Thread switch (intra)

2.638 8.820 16.601 1.149 2.918 3.729 963 2.586 3.371 422 1.759 1.561

Baseline 2018 PTI PTI, eager FPU PTI, IBRS, eager FPU

slide-32
SLIDE 32

Benchmarks - Scenario 1, IBRS

2,5 5 7,5 10 iperf3

7,68Gbit/s 9Gbit/s 9,27Gbit/s 9,37Gbit/s

Baseline 2018 PTI PTI, eager FPU PTI, IBRS, eager FPU

slide-33
SLIDE 33

Benchmarks - Scenario 2, IBRS

1,5 3 4,5 6 iperf3

1,28Gbit/s 3,12Gbit/s 3,17Gbit/s 5,14Gbit/s

Baseline 2018 PTI PTI, eager FPU PTI, IBRS, eager FPU

slide-34
SLIDE 34

Foreshadow

L1 Terminal Fault

slide-35
SLIDE 35

L1 Terminal Fault

  • Affects OS / SMM, VT-x and SGX
  • SGX not supported in L4Re
  • Don’t care
  • SMM needs to protect itself
slide-36
SLIDE 36

L1 Terminal Fault - L4Re mitigations

  • OS
  • Fiasco is not vulnerable
  • We zero our PTEs
  • VT-x is nasty
  • Microcode update
  • New MSR and new instruction for L1D flush
  • Flush L1D on every vmresume
slide-37
SLIDE 37

Benchmarks - Sorry, no benchmarks for L1TF.

slide-38
SLIDE 38

But there is one more thing …

slide-39
SLIDE 39

One more thing

  • All features / mitigations are configurable
  • You can turn off
  • PTI
  • Eager FPU
  • IBRS
  • How does this compare to the 2018 baseline?
slide-40
SLIDE 40

Micro benchmarks - pingpong

4500 9000 13500 18000 IPC inter AS Context switch Thread switch (intra)

Baseline 2018 PTI PTI, eager FPU PTI, IBRS, eager FPU Baseline 2019

slide-41
SLIDE 41

Micro benchmarks - pingpong

1000 2000 3000 4000 IPC inter AS Context switch Thread switch (intra)

1.149 2.918 3.729 963 2.586 3.371 425 1.733 1.422 422 1.759 1.561

Baseline 2018 Baseline 2019 PTI PTI, eager FPU

slide-42
SLIDE 42

Benchmarks - Scenario 1

2,5 5 7,5 10 iperf3

9Gbit/s 9,27Gbit/s 9,29Gbit/s 9,37Gbit/s

Baseline 2018 Baseline 2019 PTI PTI, eager FPU

slide-43
SLIDE 43

Benchmarks - Scenario 2

1,5 3 4,5 6 iperf3

3,12Gbit/s 3,17Gbit/s 5,14Gbit/s 5,14Gbit/s

Baseline 2018 Baseline 2019 PTI PTI, eager FPU

slide-44
SLIDE 44

Conclusion

slide-45
SLIDE 45

– Me

“Fiasco is still not the fastest microkernel in the world.”

slide-46
SLIDE 46

Conclusion

  • Some bugs did not hit as hard
  • “missing” features helped us
  • Dramatic performance impact
  • Consider alternatives compared to microcode
  • Reconsider existing legacy implementations
  • Removed IO page fault
  • What to expect in the future? How can we proactively act?
  • gcc vs. clang
slide-47
SLIDE 47