ASTEROID – AN ANALYZABLE, RESILIENT, EMBEDDED REAL-TIME OPERATING SYSTEM DESIGN
Bj ¨
- rn D ¨
- bel, Hermann H¨
artig (TU Dresden) Philip Axer, Rolf Ernst (TU Braunschweig)
B ¨
- blingen, 07
ASTEROID AN ANALYZABLE, RESILIENT, EMBEDDED REAL-TIME OPERATING - - PowerPoint PPT Presentation
ASTEROID AN ANALYZABLE, RESILIENT, EMBEDDED REAL-TIME OPERATING SYSTEM DESIGN Bj orn D obel, Hermann H artig (TU Dresden) Philip Axer, Rolf Ernst (TU Braunschweig) B oblingen, 07 .02.2013 The Many Faces of Hardware Faults
Bj ¨
artig (TU Dresden) Philip Axer, Rolf Ernst (TU Braunschweig)
1 Shirvani, McCluskey: Fault-Tolerant Systems in A Space Environment: The CRC ARGOS Project, 1998 2 Schroeder, Pinheiro, Weber: DRAM Errors in the Wild: A Large-Scale Field Study, SIGMETRICS 2009 3 Hwang, Stefanovici, Schroeder: Cosmic Rays Don’t Strike Twice: Understanding the Nature of DRAM Errors and the Implications for System Design, ASPLOS 2012 4 Pinheiro, Weber, Barroso: Failure Trends in a Large Disk Drive Population, FAST 2007 5 Shivakumar, Kistler, Keckler: Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic, DSN 2002 ASTEROID slide 1 of 18
1 Shirvani, McCluskey: Fault-Tolerant Systems in A Space Environment: The CRC ARGOS Project, 1998 2 Schroeder, Pinheiro, Weber: DRAM Errors in the Wild: A Large-Scale Field Study, SIGMETRICS 2009 3 Hwang, Stefanovici, Schroeder: Cosmic Rays Don’t Strike Twice: Understanding the Nature of DRAM Errors and the Implications for System Design, ASPLOS 2012 4 Pinheiro, Weber, Barroso: Failure Trends in a Large Disk Drive Population, FAST 2007 5 Shivakumar, Kistler, Keckler: Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic, DSN 2002 ASTEROID slide 1 of 18
1 Shirvani, McCluskey: Fault-Tolerant Systems in A Space Environment: The CRC ARGOS Project, 1998 2 Schroeder, Pinheiro, Weber: DRAM Errors in the Wild: A Large-Scale Field Study, SIGMETRICS 2009 3 Hwang, Stefanovici, Schroeder: Cosmic Rays Don’t Strike Twice: Understanding the Nature of DRAM Errors and the Implications for System Design, ASPLOS 2012 4 Pinheiro, Weber, Barroso: Failure Trends in a Large Disk Drive Population, FAST 2007 5 Shivakumar, Kistler, Keckler: Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic, DSN 2002 ASTEROID slide 1 of 18
Application L4 Runtime Environment L4/Fiasco.OC microkernel
ASTEROID slide 2 of 18
Replicated Application L4 Runtime Environment Romain L4/Fiasco.OC microkernel
ASTEROID slide 2 of 18
Unreplicated Application Replicated Application L4 Runtime Environment Romain L4/Fiasco.OC microkernel
ASTEROID slide 2 of 18
Replicated Driver Unreplicated Application Replicated Application L4 Runtime Environment Romain L4/Fiasco.OC microkernel
ASTEROID slide 2 of 18
Reliable Computing Base6 Replicated Driver Unreplicated Application Replicated Application L4 Runtime Environment Romain L4/Fiasco.OC microkernel
6 D ¨
artig, Engel Operating System Support for Redundant Multithreading, EMSOFT 2012 ASTEROID slide 2 of 18
7 D ¨
artig: Who watches the watchmen? – Protecting Operating System Reliability Mechanisms, HotDep 2012 ASTEROID slide 3 of 18
ResCore NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core 7 D ¨
artig: Who watches the watchmen? – Protecting Operating System Reliability Mechanisms, HotDep 2012 ASTEROID slide 3 of 18
IF ID MEM WB Instruction FP
EXE RA X Data FP
result inst Chunk CNT exception retire
8 Axer, Ernst, D ¨
artig: Designing an Analyzable and Resilient Embedded Operating System, SOBRES 2012 ASTEROID slide 4 of 18
Signature Setup Basic Block Signature check Entry Exit Signature generation ASTEROID slide 5 of 18
benchmarks were 6.6 instructions.
Error-Injection experiment F D RA E 0% 20% 40% 60% 80% 100% data correct and no control flow change data correct and control flow change data incorrect (dc) exception (exc) not terminating (nt)
ASTEROID slide 6 of 18
ASTEROID slide 7 of 18
ASTEROID slide 8 of 18
Timing effects in the hardware architecture (e.g. retransmission on buses, NoC)9
signaling overhead reexecution τ1 τ2 C1 C 2 E1
t
C 2 higher priority interference
20 40 60 80 100 120 140 160 180 t[ms] 10−15 10−14 10−13 10−12 10−11 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 X+(t)
F1 b F1 c F1 m F9 b F9 c F9 m 9 Axer, Ernst Stochastic Response-Time Guarantee for Non-Premptive, Fixed-Priority Scheduling Under Error, 2013, to appear ASTEROID slide 9 of 18
times:
Load core 1 0.0 0.2 0.4 0.6 0.8 1.0 L
d c
e 2 0.0 0.2 0.4 0.6 0.8 1.0 WCRT [ms] 100 200 300 400 500 600 700 800
Bitcount Parallel Sequential
Load core 1 0.0 0.2 0.4 0.6 0.8 1.0 L
d c
e 2 0.0 0.2 0.4 0.6 0.8 1.0 WCRT [ms] 200 400 600 800 1000 1200 1400 1600
Rijndael Parallel Sequential
10 10 Real-Time Analysis with pyCPA – http://code.google.com/p/pycpa ASTEROID slide 10 of 18
tolerance requirements
replication
know what to protect
FEHLER:11 – Evaluate usefulness of PVF vs. fault injection – Outline challenges and possible solutions
Register: EDX 0.2 0.4 0.6 0.8 1 PVF 0.2 0.4 0.6 0.8 1 FI Failure Ratio 0.5 50 100 150 200 250 300 350
| PVF - FI | Time [10k Instruction Blocks] 11 D ¨
ASTEROID slide 11 of 18
– Execution overhead (x100 - x1000) – Adds complexity to RCB Disassembler 6,000 LoC Tiny emulator 500 LoC
ASTEROID slide 12 of 18
Master Replica
ASTEROID slide 13 of 18
Master Replica mov eax, [ebx] X
ASTEROID slide 13 of 18
Master Replica mov eax, [ebx]
ASTEROID slide 13 of 18
Master Replica mov eax, [ebx] load repl. state NOP; NOP; ...; NOP restore master state
ASTEROID slide 13 of 18
Master Replica mov eax, [ebx] mov eax, [ebx] load repl. state NOP; NOP; ...; NOP restore master state
ASTEROID slide 13 of 18
Master Replica mov eax, [ebx] load repl. state NOP; NOP; ...; NOP restore master state
mov eax, [ebx] ASTEROID slide 13 of 18
Master Replica mov eax, [ebx] load repl. state NOP; NOP; ...; NOP restore master state
mov eax, [ebx] ASTEROID slide 13 of 18
Master Replica mov eax, [ebx] load repl. state NOP; NOP; ...; NOP restore master state
mov eax, [ebx] ASTEROID slide 13 of 18
A1 A2 A3 A4 A1 A2 A3 A4
ASTEROID slide 14 of 18
A1 A2 A3 A4 A1 A2 A3 A4 B1 B2 B3 B1 B2 B3
ASTEROID slide 14 of 18
A1 A2 A3 A4 A1 A2 A3 A4 B1 B2 B3 B1 B2 B3 C1 C2 C3 C4 C1 C2 C3 C4
ASTEROID slide 14 of 18
A1 A2 A3 A4 A1 A2 A3 A4 B1 B2 B3 B1 B2 B3 C1 C2 C3 C3 C1 C2 C3 C4
ASTEROID slide 15 of 18
A1 A2 A3 A4 A1 A2 A3 A4 B1 B2 B3 C1 C2 C3 C3 C1 C2 C3 B1 B2 B3 B4
ASTEROID slide 15 of 18
12 Liu, Curtsinger, Berger: DThreads: Efficient Deterministic Multithreading, OSDI 2011 13 Olszewski, Ansel, Amarasinghe: Kendo: Efficient Deterministic Multithreading in Software, ASPLOS 2009 ASTEROID slide 16 of 18
12 Liu, Curtsinger, Berger: DThreads: Efficient Deterministic Multithreading, OSDI 2011 13 Olszewski, Ansel, Amarasinghe: Kendo: Efficient Deterministic Multithreading in Software, ASPLOS 2009 ASTEROID slide 16 of 18
12 Liu, Curtsinger, Berger: DThreads: Efficient Deterministic Multithreading, OSDI 2011 13 Olszewski, Ansel, Amarasinghe: Kendo: Efficient Deterministic Multithreading in Software, ASPLOS 2009 ASTEROID slide 16 of 18
ASTEROID slide 17 of 18
ASTEROID slide 18 of 18
This slide intentionally left blank. Except for above text.
ASTEROID slide 19 of 18
Master
ASTEROID slide 20 of 18
Replica Replica Replica Master
ASTEROID slide 20 of 18
Replica Replica Replica Master =
ASTEROID slide 20 of 18
Replica Replica Replica Master System Call Proxy Resource Manager =
ASTEROID slide 20 of 18
1 2 3 4 5 6 Replica 1
ASTEROID slide 21 of 18
1 2 3 4 5 6 Replica 1 1 2 3 4 5 6 Replica 2
ASTEROID slide 21 of 18
1 2 3 4 5 6 Replica 1 1 2 3 4 5 6 Replica 2 1 2 3 4 5 6 Master
ASTEROID slide 21 of 18
1 2 3 4 5 6 Replica 1 1 2 3 4 5 6 Replica 2 1 2 3 4 5 6 Master Marked used Master private
ASTEROID slide 22 of 18
Replica 1 rw ro ro Replica 2 rw ro ro Master
ASTEROID slide 23 of 18
Replica 1 rw ro ro Replica 2 rw ro ro Master
ASTEROID slide 23 of 18
Replica 1 rw ro ro Replica 2 rw ro ro Master
ASTEROID slide 23 of 18
14 14 D ¨
artig, Engel: Operating System Support for Redundant Multithreading, EMSOFT 2012 ASTEROID slide 24 of 18
ASTEROID slide 25 of 18