Conflict Exceptions: Simplifying Concurrent Language Semantics with - - PowerPoint PPT Presentation

conflict exceptions simplifying concurrent language
SMART_READER_LITE
LIVE PREVIEW

Conflict Exceptions: Simplifying Concurrent Language Semantics with - - PowerPoint PPT Presentation

Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races Brandon Lucia , Luis Ceze, Karin Strauss, Shaz Qadeer and Hans-J. Boehm Data-Races are Trouble Complicated language Usually


slide-1
SLIDE 1

Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races

Brandon Lucia, Luis Ceze, Karin Strauss, Shaz Qadeer and Hans-J. Boehm

slide-2
SLIDE 2

Data-Races are Trouble

2

Complicated language specifications Usually incorrect, and difficult to debug Negative impact on system reliability

slide-3
SLIDE 3

What If...

3

Fail-Stop Semantics for Data-Races

Semantics are clear and simple Better data-race debugging Safety: races can’t cause problems

When a data-race occurs, throw an exception

slide-4
SLIDE 4

Requirements

4

High-Performance - Always-on detection Precise detection - No false positives

slide-5
SLIDE 5

Prior Work

5

Performance Precision

Happens-Before

[Elmas’07, Flanagan‘09]

  • Approx. Methods

[Savage’97, Zhou’07, Yu’05]

✓ ✗ ✓ ✗

slide-6
SLIDE 6

Prior Work

5

Performance Precision

Happens-Before

[Elmas’07, Flanagan‘09]

  • Approx. Methods

[Savage’97, Zhou’07, Yu’05]

✓ ✗ ✓ ✗

slide-7
SLIDE 7

Prior Work

5

Performance Precision

Happens-Before

[Elmas’07, Flanagan‘09]

  • Approx. Methods

[Savage’97, Zhou’07, Yu’05]

✓ ✗ ✓ ✗

Conflict Exceptions

[ISCA ‘10]

✓ ✓

slide-8
SLIDE 8

Conflict Exceptions

6

Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ...

Thread 1 Thread 2

slide-9
SLIDE 9

Conflict Exceptions

6

Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ...

Thread 1 Thread 2

Synchronization-Free Regions

slide-10
SLIDE 10

Conflict Exceptions

6

Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ...

Thread 1 Thread 2

Conflict!

Synchronization-Free Regions

slide-11
SLIDE 11

Conflict Exceptions

6

Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ...

Thread 1 Thread 2

Conflict! Exception Delivered Here

Synchronization-Free Regions

slide-12
SLIDE 12

Conflict Exceptions

6

Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ...

Thread 1 Thread 2

Conflict!

U n d e t e c t e d R a c e

Exception Delivered Here

Synchronization-Free Regions

slide-13
SLIDE 13

Conflict Exceptions

7

Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ...

Thread 1 Thread 2

Conflict!

U n d e t e c t e d R a c e

Exception Delivered Here

Synchronization-Free Regions

Precisely detect only races that can effect consistency

The Guarantee: Exception-Thrown? There was a data-race. Exception-Free? Sequential Consistency.

Ignoring unimportant races is key to performance

slide-14
SLIDE 14

Language Level Benefits

8

Acquire(K) Release(K)

Reordering in SFRs is legal Granularity independence

Rd Y Wr X Acquire(K) Release(K) Wr64_Low X Wr64_Hi X

Exception-Free executions are SC

Acq(K) Rel(K) Rd X Wr X Acq(K) Rel(K) Rd X Wr X

slide-15
SLIDE 15

Language Level Benefits

9

pthread_lock(K) pthread_unlock(K)

Programming is the same Racy programs are well-behaved

Rd Y Wr X

Race semantics are simpler

Wr Q Wr Z Acq(K) Rd X Wr X Acq(L) Rd X !

slide-16
SLIDE 16

Debugging and Reliability

10

Concurrent, conflicting SFRs throw exceptions

Acq(K) Rd X Wr X Acq(L) Rd X !

All races have some exceptional schedule Exception Handling: Log + Recover Damage Control: Shut down buggy module

slide-17
SLIDE 17

System Support for Conflict Exceptions

11

slide-18
SLIDE 18

Hardware/Software Interface

12

New Instructions:

BeginRegion and EndRegion

Synchronization Operations are Singleton Regions Exceptions Thrown Precisely Before Conflicting Instruction

slide-19
SLIDE 19

Hardware/Software Interface

12

Rd Y Wr X Rd T Wr T Acquire(K) Release(K) BeginRegion EndRegion BeginRegion EndRegion

New Instructions:

BeginRegion and EndRegion

Synchronization Operations are Singleton Regions Exceptions Thrown Precisely Before Conflicting Instruction

slide-20
SLIDE 20

Access Monitoring

13

Byte-granular access information is required

... ...

N-byte Cache Line N-bit Access Bits

Local Read Local Write Remote Read Remote Write

Exception Test: compare appropriate local and remote bits

Line-level Supplied Bit

slide-21
SLIDE 21

Coherence Support

14

CPU 1 CPU 2 Read Request Read Reply

Local Write Bits Remote Write Bits

V CPU 1 CPU 2 Write/Invalidate Invalidate Ack

Local Write Bits Local Read Bits

Read Coherence Actions Write Coherence Actions

slide-22
SLIDE 22

Coherence Support

14

CPU 1 CPU 2 Read Request Read Reply

Local Write Bits Remote Write Bits

V CPU 1 CPU 2 Write/Invalidate Invalidate Ack

Local Write Bits Local Read Bits

Read Coherence Actions Write Coherence Actions

slide-23
SLIDE 23

Ending a Region

15

CPU 1 CPU 2

Local Write Bits Local Read Bits

End-Of-Region Message Ending a Region

Address

For all supplied lines... Clears Remote Bits Specified in EOR Msg

End-Of-Region Ack

slide-24
SLIDE 24

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-25
SLIDE 25

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-26
SLIDE 26

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-27
SLIDE 27

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-28
SLIDE 28

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-29
SLIDE 29

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-30
SLIDE 30

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

Rd Req

slide-31
SLIDE 31

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-32
SLIDE 32

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

Rd Reply

slide-33
SLIDE 33

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-34
SLIDE 34

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-35
SLIDE 35

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

EoR

slide-36
SLIDE 36

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-37
SLIDE 37

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-38
SLIDE 38

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-39
SLIDE 39

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

Invalidate

slide-40
SLIDE 40

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-41
SLIDE 41

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

Inv Ack

slide-42
SLIDE 42

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

slide-43
SLIDE 43

Putting It Together

16

LR LW RR RW LR LW RR RW

CPU 1’s Cache

Wr A A B C D A B C D

A B C D A B C D

CPU 2’s Cache CPU 1’s Code

Rd C

CPU 2’s Code

BeginRegion BeginRegion EndRegion BeginRegion Wr C

Sup Sup

Exception!

slide-44
SLIDE 44

Out-Of-Cache Operation

17

CPU 1 CPU 2 Main Memory Local Table 1 Local Table 2 Global Table

Per-thread local table tracks evicted accessed addresses Per-process global table stores evicted lines’ access bits EoR messages for regions with evictions are expensive

Global Table Ptr Global Table Ptr Local Table Ptr Local Table Ptr

slide-45
SLIDE 45

Evaluation

18

Protocol verified with Zing model checker Simulator built using SESC and Pin Evaluated using PARSEC, MySQL and Apache

slide-46
SLIDE 46

Overheads

19

2 4 6 8 x 2 6 4 s w a p t i

  • n

s f e r r e t f r e q m i n e c a n n e a l b l a c k s c h

  • l

e s f a c e s i m s t r e a m c l u s t e r v i p s M y S Q L d e d u p b

  • d

y t r a c k f l u i d a n i m a t e A p a c h e M e a n

% Traffic Overhead

~5% traffic overhead on average

slide-47
SLIDE 47

Performance Impact

20

0.3 0.6 0.9 1.2 1.5 s t r e a m c l u s t e r f e r r e t c a n n e a l s w a p t i

  • n

s x 2 6 4 f a c e s i m f r e q m i n e v i p s A p a c h e d e d u p b l a c k s c h

  • l

e s b

  • d

y t r a c k M y S Q L fl u i d a n i m a t e M e a n

% In-Memory Acc Bit Lookups

Costly access bit lookups are very infrequent - 1.5% in the worst case

slide-48
SLIDE 48

Conflict Exceptions

21

Simplified language specifications Easier to debug data races Limit damage caused by race bugs

When a data-race occurs, throw an exception

slide-49
SLIDE 49

22

Also In The Paper!

Programming Model suitability analysis More in depth performance characterization Formal proof that exception free executions are SC Further protocol implementation details

slide-50
SLIDE 50

Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races

Brandon Lucia, Luis Ceze, Karin Strauss, Shaz Qadeer and Hans-J. Boehm

slide-51
SLIDE 51

Memory Overhead

24

6 12 18 24 30 f e r r e t f r e q m i n e f a c e s i m s t r e a m c l u s t e r x 2 6 4 A p a c h e c a n n e a l b

  • d

y t r a c k fl u i d a n i m a t e v i p s d e d u p s w a p t i

  • n

s b l a c k s c h

  • l

e s M y S Q L M e a n

% Memory Overhead

slide-52
SLIDE 52

Suitability

25

0.02 0.04 0.06 0.08 0.10 A p a c h e s t r e a m c l u s t e r f e r r e t d e d u p b

  • d

y t r a c k v i p s f a c e s i m f r e q m i n e x 2 6 4 c a n n e a l fl u i d a n i m a t e s w a p t i

  • n

s b l a c k s c h

  • l

e s M y S Q L M e a n

% Lines of Code w/ Exceptions