Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races
Brandon Lucia, Luis Ceze, Karin Strauss, Shaz Qadeer and Hans-J. Boehm
Conflict Exceptions: Simplifying Concurrent Language Semantics with - - PowerPoint PPT Presentation
Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races Brandon Lucia , Luis Ceze, Karin Strauss, Shaz Qadeer and Hans-J. Boehm Data-Races are Trouble Complicated language Usually
Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races
Brandon Lucia, Luis Ceze, Karin Strauss, Shaz Qadeer and Hans-J. Boehm
2
Complicated language specifications Usually incorrect, and difficult to debug Negative impact on system reliability
3
Fail-Stop Semantics for Data-Races
Semantics are clear and simple Better data-race debugging Safety: races can’t cause problems
When a data-race occurs, throw an exception
4
High-Performance - Always-on detection Precise detection - No false positives
5
Performance Precision
Happens-Before
[Elmas’07, Flanagan‘09]
[Savage’97, Zhou’07, Yu’05]
5
Performance Precision
Happens-Before
[Elmas’07, Flanagan‘09]
[Savage’97, Zhou’07, Yu’05]
5
Performance Precision
Happens-Before
[Elmas’07, Flanagan‘09]
[Savage’97, Zhou’07, Yu’05]
Conflict Exceptions
[ISCA ‘10]
6
Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ...
Thread 1 Thread 2
6
Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ...
Thread 1 Thread 2
Synchronization-Free Regions
6
Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ...
Thread 1 Thread 2
Conflict!
Synchronization-Free Regions
6
Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ...
Thread 1 Thread 2
Conflict! Exception Delivered Here
Synchronization-Free Regions
6
Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ...
Thread 1 Thread 2
Conflict!
U n d e t e c t e d R a c e
Exception Delivered Here
Synchronization-Free Regions
7
Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ...
Thread 1 Thread 2
Conflict!
U n d e t e c t e d R a c e
Exception Delivered Here
Synchronization-Free Regions
Precisely detect only races that can effect consistency
The Guarantee: Exception-Thrown? There was a data-race. Exception-Free? Sequential Consistency.
Ignoring unimportant races is key to performance
8
Acquire(K) Release(K)
Reordering in SFRs is legal Granularity independence
Rd Y Wr X Acquire(K) Release(K) Wr64_Low X Wr64_Hi X
Exception-Free executions are SC
Acq(K) Rel(K) Rd X Wr X Acq(K) Rel(K) Rd X Wr X
✓
9
pthread_lock(K) pthread_unlock(K)
Programming is the same Racy programs are well-behaved
Rd Y Wr X
Race semantics are simpler
Wr Q Wr Z Acq(K) Rd X Wr X Acq(L) Rd X !
10
Concurrent, conflicting SFRs throw exceptions
Acq(K) Rd X Wr X Acq(L) Rd X !
All races have some exceptional schedule Exception Handling: Log + Recover Damage Control: Shut down buggy module
11
12
New Instructions:
BeginRegion and EndRegion
Synchronization Operations are Singleton Regions Exceptions Thrown Precisely Before Conflicting Instruction
12
Rd Y Wr X Rd T Wr T Acquire(K) Release(K) BeginRegion EndRegion BeginRegion EndRegion
New Instructions:
BeginRegion and EndRegion
Synchronization Operations are Singleton Regions Exceptions Thrown Precisely Before Conflicting Instruction
13
Byte-granular access information is required
... ...
N-byte Cache Line N-bit Access Bits
Local Read Local Write Remote Read Remote Write
Exception Test: compare appropriate local and remote bits
Line-level Supplied Bit
14
CPU 1 CPU 2 Read Request Read Reply
Local Write Bits Remote Write Bits
V CPU 1 CPU 2 Write/Invalidate Invalidate Ack
Local Write Bits Local Read Bits
Read Coherence Actions Write Coherence Actions
14
CPU 1 CPU 2 Read Request Read Reply
Local Write Bits Remote Write Bits
V CPU 1 CPU 2 Write/Invalidate Invalidate Ack
Local Write Bits Local Read Bits
Read Coherence Actions Write Coherence Actions
15
CPU 1 CPU 2
Local Write Bits Local Read Bits
End-Of-Region Message Ending a Region
Address
For all supplied lines... Clears Remote Bits Specified in EOR Msg
End-Of-Region Ack
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
Rd Req
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
Rd Reply
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
EoR
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
Invalidate
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
Inv Ack
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
16
LR LW RR RW LR LW RR RW
CPU 1’s Cache
Wr A A B C D A B C D
A B C D A B C D
CPU 2’s Cache CPU 1’s Code
Rd C
CPU 2’s Code
BeginRegion BeginRegion EndRegion BeginRegion Wr C
Sup Sup
Exception!
17
CPU 1 CPU 2 Main Memory Local Table 1 Local Table 2 Global Table
Per-thread local table tracks evicted accessed addresses Per-process global table stores evicted lines’ access bits EoR messages for regions with evictions are expensive
Global Table Ptr Global Table Ptr Local Table Ptr Local Table Ptr
18
Protocol verified with Zing model checker Simulator built using SESC and Pin Evaluated using PARSEC, MySQL and Apache
19
2 4 6 8 x 2 6 4 s w a p t i
s f e r r e t f r e q m i n e c a n n e a l b l a c k s c h
e s f a c e s i m s t r e a m c l u s t e r v i p s M y S Q L d e d u p b
y t r a c k f l u i d a n i m a t e A p a c h e M e a n
% Traffic Overhead
~5% traffic overhead on average
20
0.3 0.6 0.9 1.2 1.5 s t r e a m c l u s t e r f e r r e t c a n n e a l s w a p t i
s x 2 6 4 f a c e s i m f r e q m i n e v i p s A p a c h e d e d u p b l a c k s c h
e s b
y t r a c k M y S Q L fl u i d a n i m a t e M e a n
% In-Memory Acc Bit Lookups
Costly access bit lookups are very infrequent - 1.5% in the worst case
21
Simplified language specifications Easier to debug data races Limit damage caused by race bugs
When a data-race occurs, throw an exception
22
Programming Model suitability analysis More in depth performance characterization Formal proof that exception free executions are SC Further protocol implementation details
Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races
Brandon Lucia, Luis Ceze, Karin Strauss, Shaz Qadeer and Hans-J. Boehm
24
6 12 18 24 30 f e r r e t f r e q m i n e f a c e s i m s t r e a m c l u s t e r x 2 6 4 A p a c h e c a n n e a l b
y t r a c k fl u i d a n i m a t e v i p s d e d u p s w a p t i
s b l a c k s c h
e s M y S Q L M e a n
% Memory Overhead
25
0.02 0.04 0.06 0.08 0.10 A p a c h e s t r e a m c l u s t e r f e r r e t d e d u p b
y t r a c k v i p s f a c e s i m f r e q m i n e x 2 6 4 c a n n e a l fl u i d a n i m a t e s w a p t i
s b l a c k s c h
e s M y S Q L M e a n
% Lines of Code w/ Exceptions