Compiler Techniques For Memory Consistency Models Students: Xing - - PowerPoint PPT Presentation

compiler techniques for memory consistency models
SMART_READER_LITE
LIVE PREVIEW

Compiler Techniques For Memory Consistency Models Students: Xing - - PowerPoint PPT Presentation

Compiler Techniques For Memory Consistency Models Students: Xing Fang (Purdue), Jaejin Lee (Seoul National University), Kyungwoo Lee (Purdue), Zehra Sura (IBM T.J. Watson Research Center), David Wong (Intel KAI Research Lab) Faculty: Sam Midkiff


slide-1
SLIDE 1

Compiler Techniques For Memory Consistency Models

Students: Xing Fang (Purdue), Jaejin Lee (Seoul National University), Kyungwoo Lee (Purdue), Zehra Sura (IBM T.J. Watson Research Center), David Wong (Intel KAI Research Lab) Faculty: Sam Midkiff (Purdue) David Padua (UIUC) Faculty: Sam Midkiff (Purdue), David Padua (UIUC) smidkiff@purdue.edu

slide-2
SLIDE 2

G l f thi t lk Goal of this talk

  • A brief overview of techniques to enable
  • A brief overview of techniques to enable

stricter consistency models to be incorporated into Cell programming incorporated into Cell programming models

– Techniques are broadly applicable Techniques are broadly applicable – Techniques are necessitated by the programming model, not hardware – Techniques are often not necessary when input program is sequential

Compilation techniques for Memory Consistency Models

2

slide-3
SLIDE 3

H d d L M d l Hardware and Language Models

Programmers Language memory model Language memory model

Orders enforced by compiler and hardware fences or syncs

Compiler Hardware memory model

O d f d b h d

H/ W

Orders enforced by hardware

Compilation techniques for Memory Consistency Models

3

slide-4
SLIDE 4

O tli Outline

  • Part I: Introduction

– Memory Consistency Models – Compiling for Memory Consistency

  • Part II: Compiler Analysis

– Delay set analysis – Synchronization analysis

  • Part III: Results and Conclusion

Compilation techniques for Memory Consistency Models

4

slide-5
SLIDE 5

Wh d i t i i ? When do consistency issues arise?

  • Issues arise whenever state is shared across different
  • Issues arise whenever state is shared across different

threads of execution

  • Typically reads/writes to shared memory
  • Synchronization, I/O, … also require attention be paid

to consistency issues – The typical programmer assumes a program will act as if yp p g p g f

  • perations occur in the order written

– Not true for most consistency models

W ill h d d / it i

  • We will use shared memory reads/writes in
  • ur examples for simplicity. Reads/writes can

be DMA operations or synchronization

Compilation techniques for Memory Consistency Models

5

p y

slide-6
SLIDE 6

S ti l C i t (SC) Sequential Consistency (SC)

Processor 2 Processor 3 Processor 4 Processor 1 S tream of S tream of Instructions

Ordering Requirement g q

Compilation techniques for Memory Consistency Models

6

slide-7
SLIDE 7

SC i t iti b t f l h SC intuitive, but no free lunch

fl ll flag = 0; a = null;

Th d 0 Th d 1 Thread 0 a = f( ); flag = 1; Thread 1 while (flag ==0); b = a; flag = 1; x[2] = 4; b = a; for (i=0; i<n; i++) { x[2] 4; for (i 0; i n; i ) { x[i] = …

Compilation techniques for Memory Consistency Models

7

slide-8
SLIDE 8

SC i h d t il th ti l SC is harder to compile than sequential Wh th i bl f b

  • Whether a variable reference can be

– strength reduced to a register reference, – hoisted from a loop, – or otherwise moved

In part depends on how used in other threads -- requires inter-thread analysis q y

Compilation techniques for Memory Consistency Models

8

slide-9
SLIDE 9

R l d C i t (RC) Relaxed Consistency (RC)

  • Like sequential programs only requires
  • Like sequential programs, only requires

relations among variable accesses within a thread to be analyzed when performing y p g

  • ptimizations (e.g. dependence/alias analysis)
  • Examples:

p

– Weak consistency, Release consistency, Java memory model

  • Semantics for well synchronized programs
  • Semantics for well synchronized programs

same as SC

Compilation techniques for Memory Consistency Models

9

slide-10
SLIDE 10

RC SC RC versus SC

SC b tt f bilit

  • SC better for programmability

– Fewer re-orderings for programmer to reason about Compiler cannot naively reorder any memory accesses – Compiler cannot naively reorder any memory accesses

  • RC better for performance

– Allow accesses to overlap or be re-ordered

  • Can we recover performance for SC?

– Compiler analysis to determine orders that really need to be enforced

Mark Hill, Multiprocessors should support simple memory-consistency models, Compilation techniques for Memory Consistency Models

10

Mark Hill, Multiprocessors should support simple memory consistency models, IEEE Computer, August 1998

slide-11
SLIDE 11

Wh b d? When can memory ops be moved?

fl ll flag = 0; a = null;

Th d 0 Th d 1 Thread 0 a = f( ); Thread 1 While (flag ==0); Conflict edges flag = 1; b = a; Program edge x[2] = 4; x[2] = …

Compilation techniques for Memory Consistency Models

11

slide-12
SLIDE 12

Wh b d? When can memory ops be moved?

fl ll flag = 0; a = null;

Th d 0 Th d 1 Thread 0 a = f( ); Thread 1 While (flag ==0); flag = 1; b = a; Oriented conflict edges x[2] = 4; x[2] = …

Compilation techniques for Memory Consistency Models

12

slide-13
SLIDE 13

H b d i t ti i t How bad orientations can exist

fl ll flag = 0; a = null;

Th d 0 Th d 1 Thread 0 flag = 1; Thread 1 While (flag ==0); a = f( ); b = a; x[2] = 4; x[2] = …

Compilation techniques for Memory Consistency Models

13

slide-14
SLIDE 14

Graph for RC -- program edges only i t b t d d t f exist between dependent references

fl ll flag = 0; a = null;

Th d 0 Th d 1 Thread 0 a = f( ); Thread 1 While (flag ==0); Conflict edges flag = 1; b = a; x[2] = 4; for (i=0; i<n; i++) { x[i] = …

Compilation techniques for Memory Consistency Models

14

slide-15
SLIDE 15

What a consistency aware compiler must do

  • Program edges involved in cycles must be
  • Program edges involved in cycles must be

treated like a dependence and enforced

  • Therefore, a consistency aware compiler must

Therefore, a consistency aware compiler must determine intra-thread memory operation

  • rderings that must be enforced because of

I h d l i hi – Inter-thread relationships – Traditional dependence relationships

  • Ordering of operations that cannot be violated
  • Ordering of operations that cannot be violated

because of inter-thread relationships are delays

Compilation techniques for Memory Consistency Models

15

slide-16
SLIDE 16

P i C il f SC Pensieve Compiler for SC

Hardware Source Java Hardware Memory Model Source Java Bytecode Program Thread Escape Analysis Program Analysis E p n y Alias Analysis Synchronization Analysis D l S A l i Synchronization Analysis D l S A l i Ordering Constraints Code Re-ordering and Code Elimination Transformations Delay Set Analysis Delay Set Analysis Target Machine Code (SC) Barrier Insertion and Optimization Elimination Transformations

J ikes RVM

Compilation techniques for Memory Consistency Models

16

Target Machine Code (SC)

slide-17
SLIDE 17

How is this handled in current languages?

  • MPI avoids these issues by not having shared
  • MPI avoids these issues by not having shared

state

  • OpenMP avoids these issues by requiring

OpenMP avoids these issues by requiring

– shared state in parallel regions to be in an atomic block – Shared state accessed via reduction, etc – Otherwise results are undefined

  • Standard Java avoids this by using a relaxed

model model

  • C/C++/Pthreads basically undefined

Compilation techniques for Memory Consistency Models

17

slide-18
SLIDE 18

Th k b t These work, but …

All f th l ti h bl

  • All of these solutions have problems

– Shared memory programming model sometimes f l useful – Undefined results make debugging hard M hi k SC – Most programmers think SC

Compilation techniques for Memory Consistency Models

18

slide-19
SLIDE 19

O tli Outline

  • Part I: Introduction

– Memory Consistency Models – Compiling for Memory Consistency

  • Part II: Compiler Analysis

E l i – Escape analysis – Alias analysis (simple type based) [Sura, PPoPP05] – Synchronization analysis Synchronization analysis – Delay set analysis

  • Part III: Results and Conclusion

Compilation techniques for Memory Consistency Models

19

slide-20
SLIDE 20

Th d E A l i Thread Escape Analysis

  • Find references to objects that may be accessed
  • Find references to objects that may be accessed

in two or more threads

– In Java, these are objects accessed directly, or indirectly, J j y y from static fields or thread object fields

  • Java does not allow arguments to be passed to thread run

methods

  • Rather, the “arguments” are passed to the thread

constructor, and stored in a field in the constructed thread

  • bject

b d l d h b l bl b h – Can be modeled as a reachability problem - an object that can be reached (directly or indirectly) by something reachable from 2 or more threads thread-escapes.

Compilation techniques for Memory Consistency Models

20

slide-21
SLIDE 21

Two phase escape analysis p p y

[Lee, PACT06]

  • U

l ff li l i t b ild

  • Uses a slow, off-line analysis to build a very

precise connection graph for available classes

  • Results of this analysis are converted to level
  • Results of this analysis are converted to level

summary form for the on-line phase

– The level summary form is used to reconcile reachability The level summary form is used to reconcile reachability information from

  • Classes not seen during the offline analysis,

Cl h h h d i b i i h ffli

  • Classes that have changed since being seen in the offline

analysis

Compilation techniques for Memory Consistency Models

21

slide-22
SLIDE 22

Online Escape Information Representation

  • Level Summary : < level EscapeState >
  • Level Summary : < level, EscapeState >

– A conservative, compact representation for parameter or argument at call site

p1

g – Tells us where escape happens – Level summary for p1 is

NoEscape

p2

< 2, ThreadEscape> – <∞, NoEscape> for p2 <0 Th dE > f

NoEscape NoEscape

2

p3

– <0, ThreadEscape> for p3

Thread1 ThreadEscape

p3

Compilation techniques for Memory Consistency Models

22 ThreadEscape

slide-23
SLIDE 23

Ali l i Alias analysis

  • Looks for references in different threads
  • Looks for references in different threads

that may access the same object

  • If at least one reference writes the object
  • If at least one reference writes the object,

a conflict exists between the two references references

  • In the Pensieve system, a simple alias

analysis that assumes references to analysis that assumes references to

  • bjects of the same type are aliased

Compilation techniques for Memory Consistency Models

23

slide-24
SLIDE 24

Synchronization Analysis y y

[Sura, PPoPP05]

  • Consider two types of synchronization:
  • Consider two types of synchronization:

– Thread structure, due to thread start() and thread join() calls – Locking, due to synchronized blocks – Yelick has looked at prod./consumer ordering

  • D t

i d i th d

  • Determine access orderings across threads,

code that cannot execute concurrently

  • Yields more accurate graph to find delays

Yields more accurate graph to find delays

– Eliminate conflict edges – Order shared memory accesses

Compilation techniques for Memory Consistency Models

24

slide-25
SLIDE 25

Delay Set Analysis y y

[Sura, PPoPP05]

  • Delay set: pairs of shared memory
  • Delay set: pairs of shared memory

accesses (X,Y) in the same thread whose

  • rder must be enforced
  • rder must be enforced
  • Shasha and Snir (TOPLAS 88) show how

to find the minimal delay set to find the minimal delay set

– Build graph to capture all possible access orders in program execution p g – Yelick showed in NP to solve this - heuristics needed

Compilation techniques for Memory Consistency Models

25

slide-26
SLIDE 26

O d i R i t Ordering Requirement

Thread 1

Captures (b)

a) X = 1 Thread 1 c) Z = Y

has occurred before (c)

a) X 1 c) Z Y b) Y = 1

Captures (a) has not yet occurred

d) W = X

Must enforce order: (a) (b) in Thread 1

Program edge Conflict edge

Compilation techniques for Memory Consistency Models

26

(b) in Thread 1

g

slide-27
SLIDE 27

Si lifi d D l S t A l i Simplified Delay Set Analysis

L k f d i t f ibl l

  • Look for end-points of a possible cycle
  • Delay from A to B if:

– A and B are thread escaping, and – Conflict edge between A and some access X, and C fli t d b t B d Y – Conflict edge between B and some access Y A Y

Program edge to Hypothesized path that

B X

Program edge to test for delay Hypothesized path that completes cycle

Compilation techniques for Memory Consistency Models

27

slide-28
SLIDE 28

O tli Outline

  • Part I: Introduction

– Memory Consistency Models – Compiling for Memory Consistency

  • Part II: Compiler Analysis

– Delay set analysis – Synchronization analysis

  • Part III: Results and Conclusions

Compilation techniques for Memory Consistency Models

28

slide-29
SLIDE 29

B h k Benchmarks

Benchmark Source Bytecodes Thread Types Benchmark Source Bytecodes Thread Types

GeneticAlgo

S tephen Hartley’s code plus Doug Lea’s library

30,147

Hashmap

Doug Lea

24,989 6 1

BoundedBuf

Uses Doug Lea’s library

12,050

Sieve

S tephen Hartley

10,811

DiskSched

Doug Lea

21 186 1 2 2

Montecarlo

Java Grande Forum

63,452

Raytracer

Java Grande Forum

33,198

DiskSched

Doug Lea

21,186 1 1 2

MolDyn

Java Grande Forum

26,913

SPECmtrt

S PECj vm98

290,260 1 2

Compilation techniques for Memory Consistency Models

29

slide-30
SLIDE 30

E i t Experiments

E ti ti f th fi ti Execution time for three configurations:

  • 1. Base: default JikesRVM with a relaxed consistency model

2 E

  • 2. Escape: sequential consistency, using:
  • iterative, context-sensitive escape analysis
  • order enforced between each pair of escaping accesses

p p g

  • 3. Delay: SC using the escape analysis above, and our

synchronization analysis and delay set analysis

Compilation techniques for Memory Consistency Models

30

slide-31
SLIDE 31

S t C fi ti System Configuration

  • Intel Pentium 4 Platform

– Dell PowerEdge 6600 SMP, using two 1.5GHz g g Xeon processors with 6GB system memory

  • PowerPC numbers available in papers

PowerPC numbers available in papers

Compilation techniques for Memory Consistency Models

31

slide-32
SLIDE 32

C il ti Ti Compilation Time

14 c] 10 12 14 Time [sec 4 6 8 mpilation T 2 4 Com field-type connect3 two-phase

Compilation techniques for Memory Consistency Models

32

yp connect3 two phase (striped bar) other compilation costs

slide-33
SLIDE 33

Dynamic Fence Counts

1 0E 10 unts 1 0E+06 1.0E+08 1.0E+10 ence Cou 1 0E+02 1.0E+04 1.0E+06 namic Fe 1.0E+00 1.0E+02 Dy field-type connect3 two-phase

Compilation techniques for Memory Consistency Models

33

slide-34
SLIDE 34

Slowdown Relative to RC

3.50 4.00 4.50 5.00 down 1.50 2.00 2.50 3.00 Slowd 0.00 0.50 1.00 1.50 field-type connect3 two-phase

Compilation techniques for Memory Consistency Models

34

field-type connect3 two phase

slide-35
SLIDE 35

SC Slowdown Relative to RC SC – Slowdown Relative to RC

11.24 14.18 11.15 11.81 14.14 14.16

8.00 9.00 C 4.00 5.00 6.00 7.00 Relative To RC 0 00 1.00 2.00 3.00 4.00 Slowdown 0.00 mtrt mold. mont. rayt. bound. disk. genet. hash. sieve perfect two-ph. connect ruf bogda none

Compilation techniques for Memory Consistency Models

35

slide-36
SLIDE 36

S l t d k Some related work

  • Yelick and Krishnamurthy Itanium and
  • Yelick and Krishnamurthy - Itanium and

UPC - focus is on SPMD style programs

  • Von Praun and Gross optimizations on
  • Von Praun and Gross - optimizations on

Java programs

  • Sasha and Snir original paper on delay
  • Sasha and Snir - original paper on delay

set analysis

  • Sreedhar Gao

lock assignment

  • Sreedhar, Gao -- lock assignment
  • Early work out of DEC WRL -- no opts

Compilation techniques for Memory Consistency Models

36

slide-37
SLIDE 37

C l i Conclusions

T h i bl f t d ff ti

  • Techniques enable fast and effective

inter-thread analysis for object-oriented i h d programs using shared memory

  • Sensitive to accuracy of escape analysis
  • SC shows average slowdown of ~20% on

an Intel Xeon platform (1.17 and 1.23 p ( w/perfect, 1.26 with two phase)

Compilation techniques for Memory Consistency Models

37

slide-38
SLIDE 38

Q ti ? Questions?

Compilation techniques for Memory Consistency Models

38