CAN WE PUT CONCURRENCY BACK INTO REDUNDANT MULTITHREADING? Bj - PowerPoint PPT Presentation

CAN WE PUT CONCURRENCY BACK INTO REDUNDANT MULTITHREADING? Bj ¨ orn D ¨ obel and Hermann H¨ artig (TU Dresden) New Delhi, 14.10.2014

Motivation: Transient Hardware Faults • Radiation-induced soft errors – Mainly an issue in avionics+space • DRAM errors in large data centers – Google Study: > 2 % failing DRAM DIMMs per year 1 – ECC insufficient 2 • Decreasing transistor sizes → higher rate of errors in CPU functional units 3 1 Schroeder, Pinheiro, Weber: DRAM Errors in the Wild: A Large-Scale Field Study , SIGMETRICS 2009 2 Hwang, Stefanovici, Schroeder: Cosmic Rays Don’t Strike Twice , ASPLOS 2012 3 Dixit, Wood: The Impact of New Technology on Soft Error Rates , IRPS 2011 RomainMT slide 1 of 17

ASTEROID Operating System Unreplicated Replicated Replicated Driver Application Application L4 Runtime Romain Environment L4/Fiasco.OC microkernel RomainMT slide 2 of 17

Romain: Structure Master Details: D ¨ obel, H¨ artig, Engel: Operating System Support for Redundant Multithreading , EMSOFT 2012 RomainMT slide 3 of 17

Romain: Structure Replica Replica Replica Master Details: D ¨ obel, H¨ artig, Engel: Operating System Support for Redundant Multithreading , EMSOFT 2012 RomainMT slide 3 of 17

Romain: Structure Replica Replica Replica = Master Details: D ¨ obel, H¨ artig, Engel: Operating System Support for Redundant Multithreading , EMSOFT 2012 RomainMT slide 3 of 17

Romain: Structure Replica Replica Replica Resource System = Manager Call Proxy Master Details: D ¨ obel, H¨ artig, Engel: Operating System Support for Redundant Multithreading , EMSOFT 2012 RomainMT slide 3 of 17

How About Multithreading? A 1 A 1 A 2 A 2 A 3 A 3 A 4 A 4 RomainMT slide 4 of 17

How About Multithreading? A 1 A 1 B 1 B 1 A 2 A 2 B 2 B 2 A 3 A 3 B 3 B 3 A 4 A 4 RomainMT slide 4 of 17

How About Multithreading? A 1 A 1 B 1 C 1 C 1 B 1 A 2 A 2 C 2 C 2 B 2 B 2 A 3 C 3 A 3 C 3 B 3 B 3 A 4 C 4 A 4 C 4 RomainMT slide 4 of 17

Problem: Nondeterminism A 1 A 1 B 1 C 1 C 1 B 1 A 2 A 2 C 2 C 2 B 2 B 2 A 3 C 3 A 3 C 3 B 3 B 3 A 4 C 3 A 4 C 4 RomainMT slide 5 of 17

Problem: Nondeterminism A 1 A 1 C 1 C 1 B 1 B 1 A 2 A 2 B 2 C 2 C 2 B 2 A 3 B 3 A 3 C 3 C 3 B 3 A 4 C 3 A 4 B 4 RomainMT slide 5 of 17

Nondeterminism Example i n t x = 1; pthread_mutex_t m = PTHREAD_MUTEX_DEFAULT ; void *thread_A( void *data) { pthread_mutex_lock (&m); x = x + 1; pthread_mutex_unlock (&m); return NULL; } void *thread_B( void *data) { pthread_mutex_lock (&m); x = x * 2; pthread_mutex_unlock (&m); return NULL; } RomainMT slide 6 of 17

Nondeterminism Example i n t x = 1; pthread_mutex_t m = PTHREAD_MUTEX_DEFAULT ; void *thread_A( void *data) { pthread_mutex_lock (&m); x = x + 1; • race-free (locks) pthread_mutex_unlock (&m); return • (A;B) → x = 4 NULL; } • (B;A) → x = 3 void *thread_B( void *data) { pthread_mutex_lock (&m); x = x * 2; pthread_mutex_unlock (&m); return NULL; } RomainMT slide 6 of 17

Solution: Deterministic Multithreading • Related work: debugging multithreaded programs • Compiler solutions: 4 no support for binary-only software 4 Bergan et al.: Core-Det: A Compiler and Runtime System for Deterministic Multithreaded Execution , ASPLOS 2010 5 Aviram et al.: Efficient System-enforced Deterministic Parallelism , OSDI 2010 6 Mushtaq et al.: Efficient Software Based Fault Tolerance Approach on Multicore Platforms , DATE 2013 7 Olszewski et al: Kendo: Efficient Deterministic Multithreading in Software , ASPLOS 2009 RomainMT slide 7 of 17

Solution: Deterministic Multithreading • Related work: debugging multithreaded programs • Compiler solutions: 4 no support for binary-only software • Workspace-Consistent Memory: 5 Requires per-replica and per-thread memory copies 4 Bergan et al.: Core-Det: A Compiler and Runtime System for Deterministic Multithreaded Execution , ASPLOS 2010 5 Aviram et al.: Efficient System-enforced Deterministic Parallelism , OSDI 2010 6 Mushtaq et al.: Efficient Software Based Fault Tolerance Approach on Multicore Platforms , DATE 2013 7 Olszewski et al: Kendo: Efficient Deterministic Multithreading in Software , ASPLOS 2009 RomainMT slide 7 of 17

Solution: Deterministic Multithreading • Related work: debugging multithreaded programs • Compiler solutions: 4 no support for binary-only software • Workspace-Consistent Memory: 5 Requires per-replica and per-thread memory copies • Lock-Based Determinism – Reliance on ECC-protected memory 6 – Our work reuses ideas from Kendo. 7 4 Bergan et al.: Core-Det: A Compiler and Runtime System for Deterministic Multithreaded Execution , ASPLOS 2010 5 Aviram et al.: Efficient System-enforced Deterministic Parallelism , OSDI 2010 6 Mushtaq et al.: Efficient Software Based Fault Tolerance Approach on Multicore Platforms , DATE 2013 7 Olszewski et al: Kendo: Efficient Deterministic Multithreading in Software , ASPLOS 2009 RomainMT slide 7 of 17

Enforced Determinism • Adapt libpthread : place INT3 into four functions – pthread mutex lock – pthread mutex unlock – pthread lock – pthread unlock • Lock operations reflected to RomainMT master • Master enforces lock ordering RomainMT slide 8 of 17

Enforced Determinism: Microbenchmark 2 Threads: 309x 90 i n t x = 0; 80 pthread_mutex_t m = Execution time in seconds PTHREAD_MUTEX_DEFAULT ; 70 197x 60 void *thread( void *data) 50 { 40 121x for ( i n t i = 0; i < 5000000; ++i) 30 { pthread_mutex_lock (&m); 20 0.286 s x = x + 1; 10 pthread_mutex_unlock (&m); 0 } return NULL; Native Single DMR TMR } RomainMT slide 9 of 17

Optimization Opportunities? 6 7 8 9 10 11 Socket 1 0 1 2 3 4 5 Socket 0 RomainMT slide 10 of 17

Optimization Opportunities? W 1 , 2 W 2 , 2 W 3 , 2 6 7 8 9 10 11 Socket 1 0 1 2 3 4 5 Socket 0 W 1 , 1 W 2 , 1 W 3 , 1 Mgr 1 Mgr 2 Mgr 3 RomainMT slide 10 of 17

Optimization Opportunities? 6 7 8 9 10 11 Socket 1 0 1 2 3 4 5 Socket 0 W 1 , 1 W 1 , 2 W 1 , 3 W 1 , 2 W 2 , 2 W 3 , 2 Mgr 1 Mgr 2 Mgr 3 RomainMT slide 10 of 17

Optimized Enforced Determinism 90 Execution time in seconds 80 212x Unoptimized 70 194x 192x CPU Placement 60 Fast Synchronization 138x 50 40 30 20 10 0 Single DMR TMR RomainMT slide 11 of 17

Cooperative Determinism Replica Replica Replica • Replication-aware pthr. pthr. pthr. libpthread rep rep rep • Replicas agree on LIP LIP LIP acquisition order w/o master invocation Lock Info • Trade-off: libpthread Page R OMAIN Master becomes single point of failure CPU 0 CPU 1 CPU 2 RomainMT slide 12 of 17

Cooperation: Lock Acquisition Yield CPU lock rep(mtx) spinlock Owner No No spinunlock Owner self? (mtx.spinlock) free? (mtx.spinlock) Yes Yes Store Owner ID Epoch No matches? Store Owner Epoch Yes return spinunlock (mtx.spinlock) RomainMT slide 13 of 17

Cooperative Determinism: Microbenchmark 67 .9x 20 18 Execution time in seconds 16 14 12 Unoptimized 31.1x CPU Placement 10 30.3x Fast Synchronization 8 13.5x 6 11.7x 4 2 2.17x 0 Single DMR TMR RomainMT slide 14 of 17

Overhead: SPLASH2, 2 workers Runtime normalized vs. native 1 . 8 1 . 7 1 . 6 1 . 5 1 . 4 1 . 3 1 . 2 1 . 1 1 0 . 9 Radiosity Barnes FMM Raytrace Water Volrend Ocean FFT LU Radix GEOMEAN Single Replica Two Replicas Three Replicas RomainMT slide 15 of 17

Overhead: SPLASH2, 4 workers 3.93 2.94 2.02 2.02 1 . 9 Runtime normalized vs. native 1 . 8 1 . 7 1 . 6 1 . 5 1 . 4 1 . 3 1 . 2 1 . 1 1 0 . 9 Radiosity Barnes FMM Raytrace Water Volrend Ocean FFT LU Radix GEOMEAN Single Replica Two Replicas Three Replicas RomainMT slide 16 of 17

Overhead: SPLASH2, 4 workers 3.93 2.94 2.02 2.02 1 . 9 Runtime normalized vs. native Sources of overhead: 1 . 8 • System call 1 . 7 interception 1 . 6 • Frequent memory 1 . 5 allocation 1 . 4 1 . 3 • Cache effects 1 . 2 • Lock density 1 . 1 1 0 . 9 Radiosity Barnes FMM Raytrace Water Volrend Ocean FFT LU Radix GEOMEAN Single Replica Two Replicas Three Replicas RomainMT slide 16 of 17

Summary • Redundant Multithreading as an OS Service • Binary-only applications • Multithreaded Replication using lock-based determinism – Enforced Determinism – Cooperative Determinism – Lock Density limits performance • Application overhead for TMR: 24% (2 workers), 65% (4 workers) http://spp1500.itec.kit.edu RomainMT slide 17 of 17

CAN WE PUT CONCURRENCY BACK INTO REDUNDANT MULTITHREADING? Bj - PowerPoint PPT Presentation

CAN WE PUT CONCURRENCY BACK INTO REDUNDANT MULTITHREADING? Bj orn D obel and Hermann H artig (TU Dresden) New Delhi, 14.10.2014 Motivation: Transient Hardware Faults Radiation-induced soft errors Mainly an issue in

MULTITHREADING ON IOS AGENDA Multithreading Basics Interlude: Closures Multithreading on iOS

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with

Chapt er 14: Redundant Arit hmet ic Keshab K. Parhi A non-redundant radix-r number has

COMP31212: Concurrency Topics 4.1: Concurrency Patterns - Monitors Topic 4.1: Concurrency

Simultaneous Multithreading: Simultaneous Multithreading: Multiplying Alpha Performance

Multithreading Recursion Checkout Multithreading and Recursion project from SVN Joe Armstrong,

Multithreading Checkout Multithreading project from SVN Joe Armstrong, Programming in

Multithreading Basics thread state: runnable, blocked Multithreading start, sleep,

Multithreading Horstmann ch.9 Multithreading Threads Thread states Thread

Concurrency Control Ensuring Isolation 354 Concurrency control Concurrency To increase

Concurrency What is concurrency? In computer science, concurrency is a property of systems which

OPERATING SYSTEM SUPPORT FOR REDUNDANT MULTITHREADING Bj orn D obel (TU Dresden) Hermann

Welcome back... Welcome back... ..to me. Welcome back... ..to me. Test out Welcome back...

Redundant Feature Elimination Redundant Feature Elimination for Multi-Class Problems for

Redundant Via Insertion Redundant Via Insertion with Wire Bending with Wire Bending Kuang-

Partitioning Tens and Ones Can you put these numbers into tens and ones? 37 = 7 30 3 7

a ( x , m ) = b i m i x i , a m i = b i x i i 9 / 48 T HE S PARSE BR A LGORITHM : M OTIVATION

Super-Resolution from Image Sequences A Review Sean Borman, Robert L. Stevenson Department of

An Observational Theory of Imperative Concurrent Data Structures in the -Calculus Luca Fossati

( ct^1t, rc.) * s..tr*t: [to.q - lo.rl I z'"e- -eet".t.. - Pe t.it4.2 6J.( / I

A message-passing approach to low-rank matrix reconstruction and application to clustering

Solving Polynom ial Eigenvalue Problem s Arising in Sim ulations of Nanoscale Quantum Dots

Multithreading programming Jan Faigl Department of Computer Science Faculty of Electrical

The Modern Analytical Landscape 2 Where We Are Today 3 Data Explosive growth Data

CAN WE PUT CONCURRENCY BACK INTO REDUNDANT MULTITHREADING? Bj - PowerPoint PPT Presentation

CAN WE PUT CONCURRENCY BACK INTO REDUNDANT MULTITHREADING? Bj orn D obel and Hermann H artig (TU Dresden) New Delhi, 14.10.2014 Motivation: Transient Hardware Faults Radiation-induced soft errors Mainly an issue in

MULTITHREADING ON IOS AGENDA Multithreading Basics Interlude: Closures Multithreading on iOS

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with

Chapt er 14: Redundant Arit hmet ic Keshab K. Parhi A non-redundant radix-r number has

COMP31212: Concurrency Topics 4.1: Concurrency Patterns - Monitors Topic 4.1: Concurrency

Simultaneous Multithreading: Simultaneous Multithreading: Multiplying Alpha Performance

Multithreading Recursion Checkout Multithreading and Recursion project from SVN Joe Armstrong,

Multithreading Checkout Multithreading project from SVN Joe Armstrong, Programming in

Multithreading Basics thread state: runnable, blocked Multithreading start, sleep,

Multithreading Horstmann ch.9 Multithreading Threads Thread states Thread

Concurrency Control Ensuring Isolation 354 Concurrency control Concurrency To increase

Concurrency What is concurrency? In computer science, concurrency is a property of systems which

OPERATING SYSTEM SUPPORT FOR REDUNDANT MULTITHREADING Bj orn D obel (TU Dresden) Hermann

Welcome back... Welcome back... ..to me. Welcome back... ..to me. Test out Welcome back...

Redundant Feature Elimination Redundant Feature Elimination for Multi-Class Problems for

Redundant Via Insertion Redundant Via Insertion with Wire Bending with Wire Bending Kuang-

Partitioning Tens and Ones Can you put these numbers into tens and ones? 37 = 7 30 3 7

a ( x , m ) = b i m i x i , a m i = b i x i i 9 / 48 T HE S PARSE BR A LGORITHM : M OTIVATION

Super-Resolution from Image Sequences A Review Sean Borman, Robert L. Stevenson Department of

An Observational Theory of Imperative Concurrent Data Structures in the -Calculus Luca Fossati

( ct^1t, rc.) * s..tr*t: [to.q - lo.rl I z'&quot;e- -eet&quot;.t.. - Pe t.it4.2 6J.( / I

A message-passing approach to low-rank matrix reconstruction and application to clustering

Solving Polynom ial Eigenvalue Problem s Arising in Sim ulations of Nanoscale Quantum Dots

Multithreading programming Jan Faigl Department of Computer Science Faculty of Electrical

The Modern Analytical Landscape 2 Where We Are Today 3 Data Explosive growth Data

( ct^1t, rc.) * s..tr*t: [to.q - lo.rl I z'"e- -eet".t.. - Pe t.it4.2 6J.( / I