OPERATING SYSTEM SUPPORT FOR REDUNDANT MULTITHREADING
Bj ¨
- rn D ¨
- bel (TU Dresden)
Hermann H¨ artig (TU Dresden) Michael Engel (TU Dortmund)
OPERATING SYSTEM SUPPORT FOR REDUNDANT MULTITHREADING Bj orn D - - PowerPoint PPT Presentation
OPERATING SYSTEM SUPPORT FOR REDUNDANT MULTITHREADING Bj orn D obel (TU Dresden) Hermann H artig (TU Dresden) Michael Engel (TU Dortmund) Tampere, 08.10.2012 Fault Tolerance: State of the Union Software errors Hardware errors
Bj ¨
Hermann H¨ artig (TU Dresden) Michael Engel (TU Dortmund)
non- COTS COTS Hardware errors Software errors
Operating System Support for Redundant Multithreading slide 1 of 13
non- COTS COTS Hardware errors Software errors
RAD-hard CPUs Redundant Multithr. Operating System Support for Redundant Multithreading slide 1 of 13
non- COTS COTS Hardware errors Software errors
RAD-hard CPUs Redundant Multithr. HP NonStop IBM z/OS Operating System Support for Redundant Multithreading slide 1 of 13
non- COTS COTS Hardware errors Software errors
RAD-hard CPUs Redundant Multithr. HP NonStop IBM z/OS SeL4 Minix3 Carburizer Operating System Support for Redundant Multithreading slide 1 of 13
non- COTS COTS Hardware errors Software errors
RAD-hard CPUs Redundant Multithr. HP NonStop IBM z/OS SeL4 Minix3 Carburizer SWIFT Encoded Processing Operating System Support for Redundant Multithreading slide 1 of 13
non- COTS COTS Hardware errors Software errors
RAD-hard CPUs Redundant Multithr. HP NonStop IBM z/OS SeL4 Minix3 Carburizer SWIFT Encoded Processing Romain Operating System Support for Redundant Multithreading slide 1 of 13
Operating System Support for Redundant Multithreading slide 2 of 13
Operating System Support for Redundant Multithreading slide 2 of 13
Application L4 Runtime Environment L4/Fiasco.OC microkernel
Operating System Support for Redundant Multithreading slide 3 of 13
Replicated Application L4 Runtime Environment Romain L4/Fiasco.OC microkernel
Operating System Support for Redundant Multithreading slide 3 of 13
Unreplicated Application Replicated Application L4 Runtime Environment Romain L4/Fiasco.OC microkernel
Operating System Support for Redundant Multithreading slide 3 of 13
Replicated Driver Unreplicated Application Replicated Application L4 Runtime Environment Romain L4/Fiasco.OC microkernel
Operating System Support for Redundant Multithreading slide 3 of 13
Reliable Computing Base Replicated Driver Unreplicated Application Replicated Application L4 Runtime Environment Romain L4/Fiasco.OC microkernel
Operating System Support for Redundant Multithreading slide 3 of 13
Master
Operating System Support for Redundant Multithreading slide 4 of 13
Replica Replica Replica Master
Operating System Support for Redundant Multithreading slide 4 of 13
Replica Replica Replica Master =
Operating System Support for Redundant Multithreading slide 4 of 13
Replica Replica Replica Master System Call Proxy Resource Manager =
Operating System Support for Redundant Multithreading slide 4 of 13
1 2 3 4 5 6 Replica 1
Operating System Support for Redundant Multithreading slide 5 of 13
1 2 3 4 5 6 Replica 1 1 2 3 4 5 6 Replica 2
Operating System Support for Redundant Multithreading slide 5 of 13
1 2 3 4 5 6 Replica 1 1 2 3 4 5 6 Replica 2 1 2 3 4 5 6 Master
Operating System Support for Redundant Multithreading slide 5 of 13
1 2 3 4 5 6 Replica 1 1 2 3 4 5 6 Replica 2 1 2 3 4 5 6 Master Marked used Master private
Operating System Support for Redundant Multithreading slide 6 of 13
Replica 1 rw ro ro Replica 2 rw ro ro Master
Operating System Support for Redundant Multithreading slide 7 of 13
Replica 1 rw ro ro Replica 2 rw ro ro Master
Operating System Support for Redundant Multithreading slide 7 of 13
Replica 1 rw ro ro Replica 2 rw ro ro Master
Operating System Support for Redundant Multithreading slide 7 of 13
– Execution overhead (x100 - x1000) – Adds complexity to RCB Disassembler 6,000 LoC Tiny emulator 500 LoC
Operating System Support for Redundant Multithreading slide 8 of 13
Master Replica
Operating System Support for Redundant Multithreading slide 9 of 13
Master Replica mov eax, [ebx] X
Operating System Support for Redundant Multithreading slide 9 of 13
Master Replica mov eax, [ebx]
Operating System Support for Redundant Multithreading slide 9 of 13
Master Replica mov eax, [ebx] load repl. state NOP; NOP; ...; NOP restore master state
Operating System Support for Redundant Multithreading slide 9 of 13
Master Replica mov eax, [ebx] mov eax, [ebx] load repl. state NOP; NOP; ...; NOP restore master state
Operating System Support for Redundant Multithreading slide 9 of 13
Master Replica mov eax, [ebx] load repl. state NOP; NOP; ...; NOP restore master state
mov eax, [ebx] Operating System Support for Redundant Multithreading slide 9 of 13
Master Replica mov eax, [ebx] load repl. state NOP; NOP; ...; NOP restore master state
mov eax, [ebx] Operating System Support for Redundant Multithreading slide 9 of 13
Master Replica mov eax, [ebx] load repl. state NOP; NOP; ...; NOP restore master state
mov eax, [ebx] Operating System Support for Redundant Multithreading slide 9 of 13
Operating System Support for Redundant Multithreading slide 10 of 13
Operating System Support for Redundant Multithreading slide 11 of 13
Operating System Support for Redundant Multithreading slide 12 of 13
Operating System Support for Redundant Multithreading slide 13 of 13
This slide intentionally left blank. Except for above text.
Operating System Support for Redundant Multithreading slide 14 of 13
protect the RCB (HW or SW)
– Has not been done for kernel code yet – Only protects SW components
– Too expensive
ResCores and NonRes-Cores
ResCore NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core Operating System Support for Redundant Multithreading slide 15 of 13
unreplicated run
in EMSOFT paper
– 12x Intel Core2 2.6 GHz – Replicas pinned to dedicated physical cores – Hyperthreading off
10 20 30 40 50 60 Overhead in % Overhead by notification method Local Faults Migration Sync IPC Shared Mem
susan CRC32 DMR susan CRC32 TMR
Operating System Support for Redundant Multithreading slide 16 of 13
Missed CPU exceptions → detected by watchdog Spurious CPU exceptions → detected by watchdog / state comparison Transmission of corrupt state → detected during state comparison Overwriting remote state during transmission
Operating System Support for Redundant Multithreading slide 17 of 13
http://www.dynamo-dresden.de Operating System Support for Redundant Multithreading slide 18 of 13