Vulcan: Hardware Support for Detecting Sequential Consistency - - PowerPoint PPT Presentation

vulcan hardware support for detecting sequential
SMART_READER_LITE
LIVE PREVIEW

Vulcan: Hardware Support for Detecting Sequential Consistency - - PowerPoint PPT Presentation

Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically Abdullah Muzahid , Shanxiang Qi , and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu MICRO December 2012 Sequential


slide-1
SLIDE 1

Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically

Abdullah Muzahid, Shanxiang Qi, and Josep Torrellas

University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

MICRO December 2012

slide-2
SLIDE 2

Josep Torrellas Vulcan: Detecting SC Violations

2

Sequential Consistency (SC)

A0: x =1 A1: y =1 B0: p = y B1: t = x PA PB

  • In SC, memory accesses:
  • Appear atomic
  • Have a total global order
  • For each thread, follow program order

A0 A1 B0 B1 A0 B0 A1 B1

slide-3
SLIDE 3

Josep Torrellas Vulcan: Detecting SC Violations

3

Sequential Consistency Violation (SCV)

  • SCV: access reorder that does not conform to SC
  • Machines support relaxed models, not SC
  • Machines may induce SC violations (SCV)

A0: x =1 A1: y =1 B0: p = y B1: t = x initially x=y=0 PA PB

p is 1 A1 B0 B1 A0 t is 0

Very unintuitive bug In SC, if p=1 then t=1

slide-4
SLIDE 4

Josep Torrellas Vulcan: Detecting SC Violations

4

Example of SCV

T1 T2

buf = malloc(...) init = true if (init) ... = buf[...]

Crash!!

slide-5
SLIDE 5

Josep Torrellas Vulcan: Detecting SC Violations

5

When Can an SCV Occur?

  • Two or more data races overlap
  • They create a cycle

A0: ref(x) A1: ref(y) B0: ref(y) B1: ref(x) PA PB A0: x =1 A1: y =1 B0: p =y B1: t = x PA PB fence fence

slide-6
SLIDE 6

Josep Torrellas Vulcan: Detecting SC Violations

6

Why Detecting SCVs is Important?

  • Programmers assume SC

– SCV is almost always a bug: unexpected interleaving – Single-stepping debuggers cannot reproduce the bug

  • Causes portability problems (e.g. Intel TBB)

– Code may not work across machines

  • Lock-free data structures sometimes explicitly use races but rely on SC

– Traditional data race detectors won’t work

  • Around 18% of reported races can cause SCV (see paper)
slide-7
SLIDE 7

Josep Torrellas Vulcan: Detecting SC Violations

7

Proposal: Vulcan

  • Detects SCVs in relaxed consistency machines in highly precise manner
  • No false positives; no false negatives
  • Provides info to debugger to debug SCV
  • No SW changes; only use executable; negligible execution overhead
  • Idea: Use HW to detect cycles of inter-thread dependences at runtime
  • Approach:
  • Use the cache coherence protocol to dynamically record dependences
  • Interrupt the processor when a cycle is about to occur
slide-8
SLIDE 8

Josep Torrellas Vulcan: Detecting SC Violations

8

Basic Algorithm

PA PB ref(x) ref(y) ref(y) ref(x)

A0

PA PB

Region R2: Should not be the destination

  • f a dependence

from Region R1

ref(y) ref(y)

B0

Region R1: Should not be the source of a dependence to Region R2

A0

PA PB

B0

Allowed Destination: AD > A0 Allowed Source: AS < B0

A0

PA PB

B0

AD > max(A0,A1) AS < min(B0,B1)

A1 B1

AD > A1 AS < B1

slide-9
SLIDE 9

Josep Torrellas Vulcan: Detecting SC Violations

9

Hardware Structures

PA PB B1 A0 B0 A1 1 1 SN: Sequence Number AS: Allowed Source AD: Allowed Destination N -1 N: # of processors

slide-10
SLIDE 10

Josep Torrellas Vulcan: Detecting SC Violations

10

Hardware Checks

PA PB Bj Ai PB

Ai Bj

PA

Request Send SN 1 Ai 2 Action at producer If (SN <= AD [P ]) exception Else AS[P ] of Bj and earlier = min[curr_value, SN ] All cases: Send response + SN Ai Bj A Ai Bj A 3 Action at consumer If (SN >= AS [P ]) exception Else AD[P ] of Ai and later = max[curr_value, SN ] Bj Ai B Bj B

slide-11
SLIDE 11

Josep Torrellas Vulcan: Detecting SC Violations

11

Safe Accesses

  • An access is Safe when it cannot cause an SCV anymore
  • The access and all its predecessors are performed and
  • All of disallowed destinations (in all the other procs) are performed

PA PB B1 A1

slide-12
SLIDE 12

Josep Torrellas Vulcan: Detecting SC Violations

12

SC Violations and Safe Accesses

  • When an SCV occurs the following must be true:
  • In the two arrows that form the cycle, the source reference is Unsafe

wrt the destination processor

slide-13
SLIDE 13

Josep Torrellas Vulcan: Detecting SC Violations

13

How Long to Keep Metadata?

  • Keep metadata as long as the access can participate in an SCV
  • Keep metadata for Unsafe accesses only
  • Store metadata in a per-processor SC Violation Queue (SCVQ)
  • Contains address + SN + AD[] + AS[], not data

Unsafe accesses = Pending + Disallowed_destinations_not_perf

slide-14
SLIDE 14

Josep Torrellas Vulcan: Detecting SC Violations

14

Detecting Dependences and Cycles: Single Word Cache Line

  • Inter-thread dependence induces a coherence bus transaction
  • Bus transaction searches the SCVQs of other processors
  • If hit, src and dst references exchange SN and run the Vulcan algorithm

PA PB RAW rd wr Same for WAW, WAR

slide-15
SLIDE 15

Josep Torrellas Vulcan: Detecting SC Violations

15

SC Violation Queue (SCVQ)

  • Keeps Vulcan metadata for Unsafe local load/stores
  • Need efficient search; cannot rely on cache snooper
  • Counting bloom filter to minimize useless SCVQ lookups
slide-16
SLIDE 16

Josep Torrellas Vulcan: Detecting SC Violations

16

Detecting Dependences and Cycles: Multiword Cache Line

  • When the destination reference of an inter-thread dependence occurs…
  • Either coherence protocol triggers a coherence bus transaction
  • Or Vulcan forces a metadata update bus access
  • Implementation: Vulcan adds V-State per byte in each line
  • Tracks whether the latest dependence on that word has already been

recorded

  • If not recorded when processor accesses the word, even if no coherence

action, force a metadata bus access

wr1 wr2 rd3 rd4

PB PA See Paper for Details

slide-17
SLIDE 17

Josep Torrellas Vulcan: Detecting SC Violations

17

Issues

  • Advantages:
  • Detects actual SC violations, not data races
  • Works for any memory model
  • Low overhead
  • Limitations:
  • Race cycles involving only two processors (very large majority)
  • Not concerned with impact of compiler transformations on SCV

With these constraints No false positives, no false negatives

slide-18
SLIDE 18

Josep Torrellas Vulcan: Detecting SC Violations

Multicore Modeled

18

Modeled a multicore chip with 8 processors

  • Core: Out of order, 2-issue width
  • RC memory model
  • Private L1, Shared L2
  • Cache line size: 32 bytes
  • Byte-level V-State bits
  • SCVQ size: 256 entries

PROGRAM DESCRIPTION Dekker Mutual exclusion Lazylist List-based concurrent set Snark Non-blocking double-ended queue Harris Non-blocking set Pthread from glibc Unwind code after canceling a thread Crypt from glibc Small table initialization code DCL bug Kernel using double-checked locking SPLASH-2 8 programs from SPLASH-2

Concurrent Algorithms Bug Kernels Full Apps

slide-19
SLIDE 19

Josep Torrellas Vulcan: Detecting SC Violations

19

Vulcan Effectively Detects SCVs

  • Vulcan detects 3 new bugs in important codes (libraries)

Program SC Violations Found Unique Total New? Dekker 1 224 Lazylist 1 150 Snark 1 1467 Harris 1 18 Pthread 2 142 Y Crypt 2 130 Y DCL 1 2 fmm (SPLASH2) 3 18 Y

slide-20
SLIDE 20

Josep Torrellas Vulcan: Detecting SC Violations

20

Example New Bug: Crypt Library Bug

if (init == True) = tab[...] if (init == False) lock L if(init == False) tab[...] = ... init = True T1 T2 unlock L

Crash!!!

fence

  • Found a new SCV in a bug fix
  • Branch condition predicted TRUE although not TRUE
  • THEN code uses the old tab[] (wrong one)
  • Tab[] is updated
  • Branch prediction is later confirmed correct
slide-21
SLIDE 21

Josep Torrellas Vulcan: Detecting SC Violations

Overhead

21

App Exec Overhead(%) fft 9.5 lu 3.6 radix 1.4 cholesky 9.0

  • cean

12.3 raytrace 6.9 barnes 2.7 fmm 2.8 avg 6.0

Traffic added:

  • 9% due to piggybacked
  • 12% due to extra bus accesses

Low overhead: OK for on-the-fly

slide-22
SLIDE 22

Josep Torrellas Vulcan: Detecting SC Violations

Also in the Paper

  • Full description of the protocol for multi-word cache lines
  • Information that a debugger would get after the exception
  • HW structure sizes and cost
  • Comparison to related work

22

slide-23
SLIDE 23

Josep Torrellas Vulcan: Detecting SC Violations

Conclusions

  • SCV bugs are arguably the hardest type of concurrency bugs
  • Vulcan is the first HW scheme to detect these bugs with high precision
  • No false positives; no false negatives
  • It has low execution overhead for on-the-fly deployment
  • 6% for 8-proc runs; 4.4% for 4-proc runs
  • It detects 3 previously unknown bugs in popular libraries

23

Lots of work to do!

slide-24
SLIDE 24

Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically

Abdullah Muzahid, Shanxiang Qi, and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

MICRO December 2012