CuriOS: Improving Reliability through Operating System Structure - - PowerPoint PPT Presentation

curios improving reliability through operating system
SMART_READER_LITE
LIVE PREVIEW

CuriOS: Improving Reliability through Operating System Structure - - PowerPoint PPT Presentation

Introduction Related OSs Design of CuriOS Evaluation Conclusion CuriOS: Improving Reliability through Operating System Structure Nils Asmussen Paper Reading Group 08/29/2012 1 / 21 Introduction Related OSs Design of CuriOS Evaluation


slide-1
SLIDE 1

Introduction Related OSs Design of CuriOS Evaluation Conclusion

CuriOS: Improving Reliability through Operating System Structure

Nils Asmussen

Paper Reading Group

08/29/2012

1 / 21

slide-2
SLIDE 2

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Outline

1

Introduction

2

Related OSs

3

Design of CuriOS

4

Evaluation

5

Conclusion

2 / 21

slide-3
SLIDE 3

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Outline

1

Introduction

2

Related OSs

3

Design of CuriOS

4

Evaluation

5

Conclusion

3 / 21

slide-4
SLIDE 4

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Motivation

OS reliability is still a major issue Microkernels improve that by isolating components from each

  • ther

But most of them don’t support restartability or at least not in a satisfying way Problem 1: blindly restarting services does not help because of client-specific state Problem 2: Still too much rights (e.g. destroying state of client A while serving client B)

4 / 21

slide-5
SLIDE 5

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Alternatives

Redundancy in HW and SW helps but is expensive Writing clients that are aware of faulting services is possible but difficult Checkpointing

Requires multiple checkpoints to avoid rolling back to a broken state Leads to high memory and performance overhead 5 / 21

slide-6
SLIDE 6

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Outline

1

Introduction

2

Related OSs

3

Design of CuriOS

4

Evaluation

5

Conclusion

6 / 21

slide-7
SLIDE 7

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Brief Description

Minix3 Reincarnation server is responsible for restarting crashed services and drivers Does only work well for stateless drivers/services Provides Datastore that can be used for checkpoints L4/Iguana Collection of OS services running on top of L4Ka::Pistachio Offers resource management, protection and some device drivers No support for restartability

7 / 21

slide-8
SLIDE 8

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Brief Description

Chorus Services run in privileged mode and share address space of kernel “Hot restart” mechanism allows servers to maintain their state No technique to prevent corruption of state Chorus OS services don’t take advantage of “hot restart” EROS Saves snapshots periodically to disk Performs some consistency checks and keeps multiple snapshots

8 / 21

slide-9
SLIDE 9

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Comparison

Kernel Restartability Minix3 Works only for stateless services L4/Iguana Might work for stateless services Chorus Does not work for stateful (?), stateless? EROS May work by restoring checkpoint

9 / 21

slide-10
SLIDE 10

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Observations

Transparency of addressing → Clients should be able to use the same address Suspension of clients → No time outs or new requests during recovery Persistence of client-specific state → Results of previous requests should persist Isolation of client-specific state → An error should not corrupt state of unaffected clients

10 / 21

slide-11
SLIDE 11

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Outline

1

Introduction

2

Related OSs

3

Design of CuriOS

4

Evaluation

5

Conclusion

11 / 21

slide-12
SLIDE 12

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Overview

12 / 21

slide-13
SLIDE 13

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Server State Management

Basics Servers that need client-specific state use the state management of CuiK A Server State Region (SSR) is an object that can be memory protected It is created if a client establishes a connection to a server A server can only access the SSR while it is processing a request from the corresponding client

13 / 21

slide-14
SLIDE 14

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Server State Management

Server types

1 Servers that do not require all client-states for operation 2 Servers that need all client-states

14 / 21

slide-15
SLIDE 15

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Server State Management

Server types

1 Servers that do not require all client-states for operation 2 Servers that need all client-states

Consistency checks

1 Recovery routine uses magic numbers in objects that are

checked

2 Server-specific checks can be implemented to ensure that

pointers and numbers are within expected ranges

14 / 21

slide-16
SLIDE 16

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Outline

1

Introduction

2

Related OSs

3

Design of CuriOS

4

Evaluation

5

Conclusion

15 / 21

slide-17
SLIDE 17

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Error Recovery

16 / 21

slide-18
SLIDE 18

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Performance

Operation Instructions Time Context Switch ? 74µs Protected Call Without SSR 1594 ± 4 195.7 ± 0.5µs Protected Call With SSR 4893 ± 3 378.9 ± 0.9µs Error detection + Recovery ? X ∗ 100µs

17 / 21

slide-19
SLIDE 19

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Memory

SSRs Each SSR is memory protected and thus has to be on its own page (1KB on ARM) Assuming that typical client states are quite small, you waste nearly one page per client

18 / 21

slide-20
SLIDE 20

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Memory

SSRs Each SSR is memory protected and thus has to be on its own page (1KB on ARM) Assuming that typical client states are quite small, you waste nearly one page per client POs Each PO has its own heap and a stack for every thread that uses the PO They say that the overhead per PO is in the order of tens of KBs Taking into account that they designed the file service to use

  • ne PO per open file, this is quite a lot

18 / 21

slide-21
SLIDE 21

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Outline

1

Introduction

2

Related OSs

3

Design of CuriOS

4

Evaluation

5

Conclusion

19 / 21

slide-22
SLIDE 22

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Conclusion

Nice concepts for restartable services and protection against unaffected state corruption Unsatisfying evaluation A lot of open questions . . .

20 / 21

slide-23
SLIDE 23

Introduction Related OSs Design of CuriOS Evaluation Conclusion

Discussion Questions

How big is the private heap in POs and can it grow? How do they place programs in the single-address-space OS? PIC? statically specified? Shouldn’t it be possible to build a similar system with multiple address spaces? Performance overhead? Comparison? Real workload? Is the memory overhead really acceptable? What about some kind of segmentation instead of paging?

21 / 21