can we put concurrency back into redundant multithreading
play

CAN WE PUT CONCURRENCY BACK INTO REDUNDANT MULTITHREADING? Bj - PowerPoint PPT Presentation

CAN WE PUT CONCURRENCY BACK INTO REDUNDANT MULTITHREADING? Bj orn D obel and Hermann H artig (TU Dresden) New Delhi, 14.10.2014 Motivation: Transient Hardware Faults Radiation-induced soft errors Mainly an issue in


  1. CAN WE PUT CONCURRENCY BACK INTO REDUNDANT MULTITHREADING? Bj ¨ orn D ¨ obel and Hermann H¨ artig (TU Dresden) New Delhi, 14.10.2014

  2. Motivation: Transient Hardware Faults • Radiation-induced soft errors – Mainly an issue in avionics+space • DRAM errors in large data centers – Google Study: > 2 % failing DRAM DIMMs per year 1 – ECC insufficient 2 • Decreasing transistor sizes → higher rate of errors in CPU functional units 3 1 Schroeder, Pinheiro, Weber: DRAM Errors in the Wild: A Large-Scale Field Study , SIGMETRICS 2009 2 Hwang, Stefanovici, Schroeder: Cosmic Rays Don’t Strike Twice , ASPLOS 2012 3 Dixit, Wood: The Impact of New Technology on Soft Error Rates , IRPS 2011 RomainMT slide 1 of 17

  3. ASTEROID Operating System Unreplicated Replicated Replicated Driver Application Application L4 Runtime Romain Environment L4/Fiasco.OC microkernel RomainMT slide 2 of 17

  4. Romain: Structure Master Details: D ¨ obel, H¨ artig, Engel: Operating System Support for Redundant Multithreading , EMSOFT 2012 RomainMT slide 3 of 17

  5. Romain: Structure Replica Replica Replica Master Details: D ¨ obel, H¨ artig, Engel: Operating System Support for Redundant Multithreading , EMSOFT 2012 RomainMT slide 3 of 17

  6. Romain: Structure Replica Replica Replica = Master Details: D ¨ obel, H¨ artig, Engel: Operating System Support for Redundant Multithreading , EMSOFT 2012 RomainMT slide 3 of 17

  7. Romain: Structure Replica Replica Replica Resource System = Manager Call Proxy Master Details: D ¨ obel, H¨ artig, Engel: Operating System Support for Redundant Multithreading , EMSOFT 2012 RomainMT slide 3 of 17

  8. How About Multithreading? A 1 A 1 A 2 A 2 A 3 A 3 A 4 A 4 RomainMT slide 4 of 17

  9. How About Multithreading? A 1 A 1 B 1 B 1 A 2 A 2 B 2 B 2 A 3 A 3 B 3 B 3 A 4 A 4 RomainMT slide 4 of 17

  10. How About Multithreading? A 1 A 1 B 1 C 1 C 1 B 1 A 2 A 2 C 2 C 2 B 2 B 2 A 3 C 3 A 3 C 3 B 3 B 3 A 4 C 4 A 4 C 4 RomainMT slide 4 of 17

  11. Problem: Nondeterminism A 1 A 1 B 1 C 1 C 1 B 1 A 2 A 2 C 2 C 2 B 2 B 2 A 3 C 3 A 3 C 3 B 3 B 3 A 4 C 3 A 4 C 4 RomainMT slide 5 of 17

  12. Problem: Nondeterminism A 1 A 1 C 1 C 1 B 1 B 1 A 2 A 2 B 2 C 2 C 2 B 2 A 3 B 3 A 3 C 3 C 3 B 3 A 4 C 3 A 4 B 4 RomainMT slide 5 of 17

  13. Nondeterminism Example i n t x = 1; pthread_mutex_t m = PTHREAD_MUTEX_DEFAULT ; void *thread_A( void *data) { pthread_mutex_lock (&m); x = x + 1; pthread_mutex_unlock (&m); return NULL; } void *thread_B( void *data) { pthread_mutex_lock (&m); x = x * 2; pthread_mutex_unlock (&m); return NULL; } RomainMT slide 6 of 17

  14. Nondeterminism Example i n t x = 1; pthread_mutex_t m = PTHREAD_MUTEX_DEFAULT ; void *thread_A( void *data) { pthread_mutex_lock (&m); x = x + 1; • race-free (locks) pthread_mutex_unlock (&m); return • (A;B) → x = 4 NULL; } • (B;A) → x = 3 void *thread_B( void *data) { pthread_mutex_lock (&m); x = x * 2; pthread_mutex_unlock (&m); return NULL; } RomainMT slide 6 of 17

  15. Solution: Deterministic Multithreading • Related work: debugging multithreaded programs • Compiler solutions: 4 no support for binary-only software 4 Bergan et al.: Core-Det: A Compiler and Runtime System for Deterministic Multithreaded Execution , ASPLOS 2010 5 Aviram et al.: Efficient System-enforced Deterministic Parallelism , OSDI 2010 6 Mushtaq et al.: Efficient Software Based Fault Tolerance Approach on Multicore Platforms , DATE 2013 7 Olszewski et al: Kendo: Efficient Deterministic Multithreading in Software , ASPLOS 2009 RomainMT slide 7 of 17

  16. Solution: Deterministic Multithreading • Related work: debugging multithreaded programs • Compiler solutions: 4 no support for binary-only software • Workspace-Consistent Memory: 5 Requires per-replica and per-thread memory copies 4 Bergan et al.: Core-Det: A Compiler and Runtime System for Deterministic Multithreaded Execution , ASPLOS 2010 5 Aviram et al.: Efficient System-enforced Deterministic Parallelism , OSDI 2010 6 Mushtaq et al.: Efficient Software Based Fault Tolerance Approach on Multicore Platforms , DATE 2013 7 Olszewski et al: Kendo: Efficient Deterministic Multithreading in Software , ASPLOS 2009 RomainMT slide 7 of 17

  17. Solution: Deterministic Multithreading • Related work: debugging multithreaded programs • Compiler solutions: 4 no support for binary-only software • Workspace-Consistent Memory: 5 Requires per-replica and per-thread memory copies • Lock-Based Determinism – Reliance on ECC-protected memory 6 – Our work reuses ideas from Kendo. 7 4 Bergan et al.: Core-Det: A Compiler and Runtime System for Deterministic Multithreaded Execution , ASPLOS 2010 5 Aviram et al.: Efficient System-enforced Deterministic Parallelism , OSDI 2010 6 Mushtaq et al.: Efficient Software Based Fault Tolerance Approach on Multicore Platforms , DATE 2013 7 Olszewski et al: Kendo: Efficient Deterministic Multithreading in Software , ASPLOS 2009 RomainMT slide 7 of 17

  18. Enforced Determinism • Adapt libpthread : place INT3 into four functions – pthread mutex lock – pthread mutex unlock – pthread lock – pthread unlock • Lock operations reflected to RomainMT master • Master enforces lock ordering RomainMT slide 8 of 17

  19. Enforced Determinism: Microbenchmark 2 Threads: 309x 90 i n t x = 0; 80 pthread_mutex_t m = Execution time in seconds PTHREAD_MUTEX_DEFAULT ; 70 197x 60 void *thread( void *data) 50 { 40 121x for ( i n t i = 0; i < 5000000; ++i) 30 { pthread_mutex_lock (&m); 20 0.286 s x = x + 1; 10 pthread_mutex_unlock (&m); 0 } return NULL; Native Single DMR TMR } RomainMT slide 9 of 17

  20. Optimization Opportunities? 6 7 8 9 10 11 Socket 1 0 1 2 3 4 5 Socket 0 RomainMT slide 10 of 17

  21. Optimization Opportunities? W 1 , 2 W 2 , 2 W 3 , 2 6 7 8 9 10 11 Socket 1 0 1 2 3 4 5 Socket 0 W 1 , 1 W 2 , 1 W 3 , 1 Mgr 1 Mgr 2 Mgr 3 RomainMT slide 10 of 17

  22. Optimization Opportunities? 6 7 8 9 10 11 Socket 1 0 1 2 3 4 5 Socket 0 W 1 , 1 W 1 , 2 W 1 , 3 W 1 , 2 W 2 , 2 W 3 , 2 Mgr 1 Mgr 2 Mgr 3 RomainMT slide 10 of 17

  23. Optimized Enforced Determinism 90 Execution time in seconds 80 212x Unoptimized 70 194x 192x CPU Placement 60 Fast Synchronization 138x 50 40 30 20 10 0 Single DMR TMR RomainMT slide 11 of 17

  24. Cooperative Determinism Replica Replica Replica • Replication-aware pthr. pthr. pthr. libpthread rep rep rep • Replicas agree on LIP LIP LIP acquisition order w/o master invocation Lock Info • Trade-off: libpthread Page R OMAIN Master becomes single point of failure CPU 0 CPU 1 CPU 2 RomainMT slide 12 of 17

  25. Cooperation: Lock Acquisition Yield CPU lock rep(mtx) spinlock Owner No No spinunlock Owner self? (mtx.spinlock) free? (mtx.spinlock) Yes Yes Store Owner ID Epoch No matches? Store Owner Epoch Yes return spinunlock (mtx.spinlock) RomainMT slide 13 of 17

  26. Cooperative Determinism: Microbenchmark 67 .9x 20 18 Execution time in seconds 16 14 12 Unoptimized 31.1x CPU Placement 10 30.3x Fast Synchronization 8 13.5x 6 11.7x 4 2 2.17x 0 Single DMR TMR RomainMT slide 14 of 17

  27. Overhead: SPLASH2, 2 workers Runtime normalized vs. native 1 . 8 1 . 7 1 . 6 1 . 5 1 . 4 1 . 3 1 . 2 1 . 1 1 0 . 9 Radiosity Barnes FMM Raytrace Water Volrend Ocean FFT LU Radix GEOMEAN Single Replica Two Replicas Three Replicas RomainMT slide 15 of 17

  28. Overhead: SPLASH2, 4 workers 3.93 2.94 2.02 2.02 1 . 9 Runtime normalized vs. native 1 . 8 1 . 7 1 . 6 1 . 5 1 . 4 1 . 3 1 . 2 1 . 1 1 0 . 9 Radiosity Barnes FMM Raytrace Water Volrend Ocean FFT LU Radix GEOMEAN Single Replica Two Replicas Three Replicas RomainMT slide 16 of 17

  29. Overhead: SPLASH2, 4 workers 3.93 2.94 2.02 2.02 1 . 9 Runtime normalized vs. native Sources of overhead: 1 . 8 • System call 1 . 7 interception 1 . 6 • Frequent memory 1 . 5 allocation 1 . 4 1 . 3 • Cache effects 1 . 2 • Lock density 1 . 1 1 0 . 9 Radiosity Barnes FMM Raytrace Water Volrend Ocean FFT LU Radix GEOMEAN Single Replica Two Replicas Three Replicas RomainMT slide 16 of 17

  30. Summary • Redundant Multithreading as an OS Service • Binary-only applications • Multithreaded Replication using lock-based determinism – Enforced Determinism – Cooperative Determinism – Lock Density limits performance • Application overhead for TMR: 24% (2 workers), 65% (4 workers) http://spp1500.itec.kit.edu RomainMT slide 17 of 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend