CAN WE PUT CONCURRENCY BACK INTO REDUNDANT MULTITHREADING?
Bj ¨
- rn D ¨
- bel and Hermann H¨
artig (TU Dresden)
CAN WE PUT CONCURRENCY BACK INTO REDUNDANT MULTITHREADING? Bj - - PowerPoint PPT Presentation
CAN WE PUT CONCURRENCY BACK INTO REDUNDANT MULTITHREADING? Bj orn D obel and Hermann H artig (TU Dresden) New Delhi, 14.10.2014 Motivation: Transient Hardware Faults Radiation-induced soft errors Mainly an issue in
Bj ¨
artig (TU Dresden)
– Mainly an issue in avionics+space
– Google Study: > 2% failing DRAM DIMMs per year1 – ECC insufficient2
1 Schroeder, Pinheiro, Weber: DRAM Errors in the Wild: A Large-Scale Field Study, SIGMETRICS 2009 2 Hwang, Stefanovici, Schroeder: Cosmic Rays Don’t Strike Twice, ASPLOS 2012 3 Dixit, Wood: The Impact of New Technology on Soft Error Rates, IRPS 2011 RomainMT slide 1 of 17
Replicated Driver Unreplicated Application Replicated Application L4 Runtime Environment Romain L4/Fiasco.OC microkernel
RomainMT slide 2 of 17
Master
Details: D ¨
artig, Engel: Operating System Support for Redundant Multithreading, EMSOFT 2012 RomainMT slide 3 of 17
Replica Replica Replica Master
Details: D ¨
artig, Engel: Operating System Support for Redundant Multithreading, EMSOFT 2012 RomainMT slide 3 of 17
Replica Replica Replica Master =
Details: D ¨
artig, Engel: Operating System Support for Redundant Multithreading, EMSOFT 2012 RomainMT slide 3 of 17
Replica Replica Replica Master System Call Proxy Resource Manager =
Details: D ¨
artig, Engel: Operating System Support for Redundant Multithreading, EMSOFT 2012 RomainMT slide 3 of 17
A1 A2 A3 A4 A1 A2 A3 A4
RomainMT slide 4 of 17
A1 A2 A3 A4 A1 A2 A3 A4 B1 B2 B3 B1 B2 B3
RomainMT slide 4 of 17
A1 A2 A3 A4 A1 A2 A3 A4 B1 B2 B3 B1 B2 B3 C1 C2 C3 C4 C1 C2 C3 C4
RomainMT slide 4 of 17
A1 A2 A3 A4 A1 A2 A3 A4 B1 B2 B3 B1 B2 B3 C1 C2 C3 C3 C1 C2 C3 C4
RomainMT slide 5 of 17
A1 A2 A3 A4 A1 A2 A3 A4 B1 B2 B3 C1 C2 C3 C3 C1 C2 C3 B1 B2 B3 B4
RomainMT slide 5 of 17
i n t x = 1; pthread_mutex_t m = PTHREAD_MUTEX_DEFAULT ; void *thread_A( void *data) { pthread_mutex_lock (&m); x = x + 1; pthread_mutex_unlock (&m); return NULL; } void *thread_B( void *data) { pthread_mutex_lock (&m); x = x * 2; pthread_mutex_unlock (&m); return NULL; }
RomainMT slide 6 of 17
i n t x = 1; pthread_mutex_t m = PTHREAD_MUTEX_DEFAULT ; void *thread_A( void *data) { pthread_mutex_lock (&m); x = x + 1; pthread_mutex_unlock (&m); return NULL; } void *thread_B( void *data) { pthread_mutex_lock (&m); x = x * 2; pthread_mutex_unlock (&m); return NULL; }
RomainMT slide 6 of 17
4 Bergan et al.: Core-Det: A Compiler and Runtime System for Deterministic Multithreaded Execution, ASPLOS 2010 5 Aviram et al.: Efficient System-enforced Deterministic Parallelism, OSDI 2010 6 Mushtaq et al.: Efficient Software Based Fault Tolerance Approach on Multicore Platforms, DATE 2013 7 Olszewski et al: Kendo: Efficient Deterministic Multithreading in Software, ASPLOS 2009 RomainMT slide 7 of 17
4 Bergan et al.: Core-Det: A Compiler and Runtime System for Deterministic Multithreaded Execution, ASPLOS 2010 5 Aviram et al.: Efficient System-enforced Deterministic Parallelism, OSDI 2010 6 Mushtaq et al.: Efficient Software Based Fault Tolerance Approach on Multicore Platforms, DATE 2013 7 Olszewski et al: Kendo: Efficient Deterministic Multithreading in Software, ASPLOS 2009 RomainMT slide 7 of 17
– Reliance on ECC-protected memory6 – Our work reuses ideas from Kendo.7
4 Bergan et al.: Core-Det: A Compiler and Runtime System for Deterministic Multithreaded Execution, ASPLOS 2010 5 Aviram et al.: Efficient System-enforced Deterministic Parallelism, OSDI 2010 6 Mushtaq et al.: Efficient Software Based Fault Tolerance Approach on Multicore Platforms, DATE 2013 7 Olszewski et al: Kendo: Efficient Deterministic Multithreading in Software, ASPLOS 2009 RomainMT slide 7 of 17
– pthread mutex lock – pthread mutex unlock – pthread lock – pthread unlock
RomainMT slide 8 of 17
Native Single DMR TMR 10 20 30 40 50 60 70 80 90 Execution time in seconds
0.286 s 121x 197x 309x
2 Threads: i n t x = 0; pthread_mutex_t m = PTHREAD_MUTEX_DEFAULT ; void *thread( void *data) { for ( i n t i = 0; i < 5000000; ++i) { pthread_mutex_lock (&m); x = x + 1; pthread_mutex_unlock (&m); } return NULL; }
RomainMT slide 9 of 17
1 2 3 4 5 Socket 0 6 7 8 9 10 11 Socket 1
RomainMT slide 10 of 17
1 2 3 4 5 Socket 0 6 7 8 9 10 11 Socket 1
W1,1 W2,1 W3,1 Mgr1 Mgr2 Mgr3 W1,2 W2,2 W3,2 RomainMT slide 10 of 17
1 2 3 4 5 Socket 0 6 7 8 9 10 11 Socket 1
W1,1 W1,2 W1,3 Mgr1 Mgr2 Mgr3 W1,2 W2,2 W3,2 RomainMT slide 10 of 17
Single DMR TMR 10 20 30 40 50 60 70 80 90 Execution time in seconds Unoptimized CPU Placement Fast Synchronization
194x 138x 212x 192x RomainMT slide 11 of 17
Replica pthr. rep LIP Replica pthr. rep LIP Replica pthr. rep LIP Lock Info Page ROMAIN Master CPU 0 CPU 1 CPU 2
RomainMT slide 12 of 17
spinlock (mtx.spinlock) lock rep(mtx) Owner free? Owner self? spinunlock (mtx.spinlock) No No Yield CPU Store Owner ID Store Owner Epoch Epoch matches? spinunlock (mtx.spinlock) Yes Yes return Yes No RomainMT slide 13 of 17
Single DMR TMR 2 4 6 8 10 12 14 16 18 20 Execution time in seconds Unoptimized CPU Placement Fast Synchronization
2.17x 13.5x 11.7x 67 .9x 31.1x 30.3x RomainMT slide 14 of 17
Radiosity Barnes FMM Raytrace Water Volrend Ocean FFT LU Radix GEOMEAN 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
Runtime normalized vs. native
Single Replica Two Replicas Three Replicas RomainMT slide 15 of 17
Radiosity Barnes FMM Raytrace Water Volrend Ocean FFT LU Radix GEOMEAN 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 3.93 2.94 2.02 2.02
Runtime normalized vs. native
Single Replica Two Replicas Three Replicas RomainMT slide 16 of 17
Radiosity Barnes FMM Raytrace Water Volrend Ocean FFT LU Radix GEOMEAN 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 3.93 2.94 2.02 2.02
Runtime normalized vs. native
Single Replica Two Replicas Three Replicas
Sources of overhead:
interception
allocation
RomainMT slide 16 of 17
– Enforced Determinism – Cooperative Determinism – Lock Density limits performance
http://spp1500.itec.kit.edu
RomainMT slide 17 of 17