Uncovering Hidden Loop Level Parallelism in Sequential Applications - - PowerPoint PPT Presentation

uncovering hidden loop level parallelism in sequential
SMART_READER_LITE
LIVE PREVIEW

Uncovering Hidden Loop Level Parallelism in Sequential Applications - - PowerPoint PPT Presentation

Uncovering Hidden Loop Level Parallelism in Sequential Applications Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, Scott Mahlke Advanced Computer Architecture Lab. University of Michigan University of Michigan 1 Electrical Engineering and


slide-1
SLIDE 1

1

University of Michigan Electrical Engineering and Com puter Science

Uncovering Hidden Loop Level Parallelism in Sequential Applications

Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, Scott Mahlke Advanced Computer Architecture Lab. University of Michigan

slide-2
SLIDE 2

2

University of Michigan Electrical Engineering and Com puter Science

CMP Architectures

  • Multiple cores on a chip

– Higher throughput – Reduced complexity (per core) – More power/heat friendly

  • Multithreaded applications

Intel Core 2 Duo AMD Quad-core (Barcelona) Sun Niagara 2

slide-3
SLIDE 3

3

University of Michigan Electrical Engineering and Com puter Science

How About Single Thread?

[Source : Bridges et al, MICRO `07]

slide-4
SLIDE 4

4

University of Michigan Electrical Engineering and Com puter Science

Loop Level Parallelization

i = 0 -3 9

DOALL loop

slide-5
SLIDE 5

5

University of Michigan Electrical Engineering and Com puter Science

Loop Level Parallelization

i = 0 -3 9 i = 2 0 -3 9 i = 0 -1 9

Core 1 Core 0 DOALL loop

slide-6
SLIDE 6

6

University of Michigan Electrical Engineering and Com puter Science

Loop Level Parallelization

i = 0 -3 9

Speculative DOALL loop

slide-7
SLIDE 7

7

University of Michigan Electrical Engineering and Com puter Science

Loop Level Parallelization

i = 0 -3 9 i = 1 0 -1 9 i = 3 0 -3 9 i = 0 -9 i = 2 0 -2 9

Core 1 Core 0 Loop Chunk Speculative DOALL loop

slide-8
SLIDE 8

8

University of Michigan Electrical Engineering and Com puter Science

Loop Level Parallelization

i = 0 -3 9 i = 1 0 -1 9 i = 3 0 -3 9 i = 0 -9 i = 2 0 -2 9

Core 1 Core 0 Loop Chunk

Bad news: limited number of parallel loops in general purpose applications

–1.3x speedup for SpecINT2000 on 4 cores

Speculative DOALL loop

slide-9
SLIDE 9

9

University of Michigan Electrical Engineering and Com puter Science

Contributions

  • Code generation framework

– Speculative parallelization of uncounted loops

  • Compiler transformations

– Speculative loop fission – Isolation of infrequent dependences – Speculative prematerialization

Initialization Consolidation

Abort Handler for(i=IS; i<IE; i++) { ...... if (brk_cond) local_brk_flag = 1; break;

}

XBEGIN if (global_brk_flag) break; perm = RECV(THREADj-1) XCOMMIT if (local_brk_flag) global_brk_flag = 1; kill_other_threads; elseif (IE < n) SEND(perm,THREADj+1) IS = ...; IE = ...;

Spawn

slide-10
SLIDE 10

10

University of Michigan Electrical Engineering and Com puter Science

Target Architecture

L2 cache L2 cache Core 0 Core 1 Core 2 Core 3

slide-11
SLIDE 11

11

University of Michigan Electrical Engineering and Com puter Science

Target Architecture

L2 cache L2 cache Core 0 Core 1 Core 2 Core 3

Scalar operand network

slide-12
SLIDE 12

12

University of Michigan Electrical Engineering and Com puter Science

Target Architecture

L2 cache L2 cache Core 0 Core 1 Core 2 Core 3

Hardware transactional memory Scalar operand network

slide-13
SLIDE 13

13

University of Michigan Electrical Engineering and Com puter Science

Code Generation Framework

for (i=0;i<n;i++) // original loop code

slide-14
SLIDE 14

14

University of Michigan Electrical Engineering and Com puter Science

Code Generation Framework

while (...) IS+=...; IE+=...; XBEGIN XCOMMIT for (i=IS;i<IE;i++) // original loop code

slide-15
SLIDE 15

15

University of Michigan Electrical Engineering and Com puter Science

RECV(THREADj-1) XCOMMIT SEND(THREADj+1)

Code Generation Framework

while (...) IS+=...; IE+=...; XBEGIN for (i=IS;i<IE;i++) // original loop code

slide-16
SLIDE 16

16

University of Michigan Electrical Engineering and Com puter Science

RECV(THREADj-1) XCOMMIT SEND(THREADj+1)

Code Generation Framework

while (...) IS+=...; IE+=...; XBEGIN for (i=IS;i<IE;i++) // original loop code Spawn

slide-17
SLIDE 17

17

University of Michigan Electrical Engineering and Com puter Science

RECV(THREADj-1) XCOMMIT SEND(THREADj+1)

Code Generation Framework

while (...) IS+=...; IE+=...; XBEGIN for (i=IS;i<IE;i++) // original loop code if (brkCond) break; Spawn

slide-18
SLIDE 18

18

University of Michigan Electrical Engineering and Com puter Science

for (i=IS;i<IE;i++) // original loop code if (brkCond) localBrk=1; break; RECV(THREADj-1) XCOMMIT if (localBrk) globalBrk=1;abortOtherTXs; SEND(THREADj+1)

Code Generation Framework

while (...) IS+=...; IE+=...; XBEGIN if (globalBrk) break; Spawn

slide-19
SLIDE 19

19

University of Michigan Electrical Engineering and Com puter Science

for (i=IS;i<IE;i++) // original loop code if (brkCond) localBrk=1; break; RECV(THREADj-1) XCOMMIT if (localBrk) globalBrk=1;abortOtherTXs; SEND(THREADj+1)

Code Generation Framework

while (...) IS+=...; IE+=...; XBEGIN if (globalBrk) break; Consolidation Spawn

slide-20
SLIDE 20

20

University of Michigan Electrical Engineering and Com puter Science

Code Generation Framework

  • Supports counted and

uncounted loops

– Software managed control speculation

  • Iteration chunking
  • Enforce transaction
  • rdering
  • Handles livein, liveout &

accumulator registers

for (i=IS;i<IE;i++) // original loop code if (brkCond) localBrk=1; break; RECV(THREADj-1) XCOMMIT if (localBrk) globalBrk=1;abortOtherTXs; SEND(THREADj+1) while (...) IS+=...; IE+=...; XBEGIN if (globalBrk) break; Consolidation Spawn

slide-21
SLIDE 21

21

University of Michigan Electrical Engineering and Com puter Science

0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1 052.alvinn 056.ear 171.swim 172.m grid 177.m esa 179.art 183.equake 188.am m p 008.espresso 023.eqntott 026.com press 072.sc 099.go 124.m 88ksim 129.com press 130.li 132.ijpeg 164.gzip 175.vpr 181.m cf 197.parser 256.bzip2 300.twolf cjpeg djpeg epic g721decode g721encode gsm decode gsm encode m peg2dec m peg2enc pegwitdec pegwitenc rawcaudio rawdaudio unepic grep lex yacc average SPEC FP SPEC INT Mediabench Utilities Fraction of sequential execution

Provable DOALL

DOALL Coverage – Provable and Profiled

slide-22
SLIDE 22

22

University of Michigan Electrical Engineering and Com puter Science

DOALL Coverage – Provable and Profiled

0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1 052.alvinn 056.ear 171.swim 172.m grid 177.m esa 179.art 183.equake 188.am m p 008.espresso 023.eqntott 026.com press 072.sc 099.go 124.m 88ksim 129.com press 130.li 132.ijpeg 164.gzip 175.vpr 181.m cf 197.parser 256.bzip2 300.twolf cjpeg djpeg epic g721decode g721encode gsm decode gsm encode m peg2dec m peg2enc pegwitdec pegwitenc rawcaudio rawdaudio unepic grep lex yacc average SPEC FP SPEC INT Mediabench Utilities Fraction of sequential execution

Profiled DOALL Provable DOALL

slide-23
SLIDE 23

23

University of Michigan Electrical Engineering and Com puter Science

DOALL Coverage – Provable and Profiled

0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1 052.alvinn 056.ear 171.swim 172.m grid 177.m esa 179.art 183.equake 188.am m p 008.espresso 023.eqntott 026.com press 072.sc 099.go 124.m 88ksim 129.com press 130.li 132.ijpeg 164.gzip 175.vpr 181.m cf 197.parser 256.bzip2 300.twolf cjpeg djpeg epic g721decode g721encode gsm decode gsm encode m peg2dec m peg2enc pegwitdec pegwitenc rawcaudio rawdaudio unepic grep lex yacc average SPEC FP SPEC INT Mediabench Utilities Fraction of sequential execution

Profiled DOALL Provable DOALL

Still not good enough! Few dependences hinder parallelization in many loops

slide-24
SLIDE 24

24

University of Michigan Electrical Engineering and Com puter Science

DOALL Coverage – Provable and Profiled

0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1 052.alvinn 056.ear 171.swim 172.m grid 177.m esa 179.art 183.equake 188.am m p 008.espresso 023.eqntott 026.com press 072.sc 099.go 124.m 88ksim 129.com press 130.li 132.ijpeg 164.gzip 175.vpr 181.m cf 197.parser 256.bzip2 300.twolf cjpeg djpeg epic g721decode g721encode gsm decode gsm encode m peg2dec m peg2enc pegwitdec pegwitenc rawcaudio rawdaudio unepic grep lex yacc average SPEC FP SPEC INT Mediabench Utilities Fraction of sequential execution

Profiled DOALL Provable DOALL

Still not good enough! Few dependences hinder parallelization in many loops

Compiler can help:

  • Speculative fission
  • Isolation of infrequent paths
  • Speculative prematerialization
slide-25
SLIDE 25

25

University of Michigan Electrical Engineering and Com puter Science

1: while (node) { 2: work(node); 3: node = node->next; }

Speculative Loop Fission

slide-26
SLIDE 26

26

University of Michigan Electrical Engineering and Com puter Science

1: while (node) { 2: work(node); 3: node = node->next; }

Speculative Loop Fission

1: while (node) { 4: node_array[count++] = node; 3: node = node->next; }

slide-27
SLIDE 27

27

University of Michigan Electrical Engineering and Com puter Science

1: while (node) { 2: work(node); 3: node = node->next; }

Speculative Loop Fission

1: while (node) { 4: node_array[count++] = node; 3: node = node->next; } XBEGIN 5: node = node_array[IS]; i = 0; 1':while (node && i++ < CS) { 2: work(node); 3': node = node->next; } RECV(THREADj-1) XCOMMIT SEND(THREADj+1) }

slide-28
SLIDE 28

28

University of Michigan Electrical Engineering and Com puter Science

1: while (node) { 2: work(node); 3: node = node->next; }

Speculative Loop Fission

XBEGIN 5: node = node_array[IS]; i = 0; 1':while (node && i++ < CS) { 2: work(node); 3': node = node->next; } RECV(THREADj-1) XCOMMIT if (node!= node_array[IS+CS]){ update_node_array; kill_other_threads();} SEND(THREADj+1) } 1: while (node) { 4: node_array[count++] = node; 3: node = node->next; }

slide-29
SLIDE 29

29

University of Michigan Electrical Engineering and Com puter Science

1: while (node) { 2: work(node); 3: node = node->next; }

Speculative Loop Fission

XBEGIN 5: node = node_array[IS]; i = 0; 1':while (node && i++ < CS) { 2: work(node); 3': node = node->next; } RECV(THREADj-1) XCOMMIT if (node!= node_array[IS+CS]){ update_node_array; kill_other_threads();} SEND(THREADj+1) } 1: while (node) { 4: node_array[count++] = node; 3: node = node->next; }

slide-30
SLIDE 30

30

University of Michigan Electrical Engineering and Com puter Science

1: while (node) { 2: work(node); 3: node = node->next; }

Speculative Loop Fission

XBEGIN 5: node = node_array[IS]; i = 0; 1':while (node && i++ < CS) { 2: work(node); 3': node = node->next; } RECV(THREADj-1) XCOMMIT if (node!= node_array[IS+CS]){ update_node_array; kill_other_threads();} SEND(THREADj+1) } 1: while (node) { 4: node_array[count++] = node; 3: node = node->next; }

slide-31
SLIDE 31

31

University of Michigan Electrical Engineering and Com puter Science

1: while (node) { 2: work(node); 3: node = node->next; }

Speculative Loop Fission

XBEGIN 5: node = node_array[IS]; i = 0; 1':while (node && i++ < CS) { 2: work(node); 3': node = node->next; } RECV(THREADj-1) XCOMMIT if (node!= node_array[IS+CS]){ update_node_array; kill_other_threads();} SEND(THREADj+1) } 1: while (node) { 4: node_array[count++] = node; 3: node = node->next; }

slide-32
SLIDE 32

32

University of Michigan Electrical Engineering and Com puter Science

1: while (node) { 2: work(node); 3: node = node->next; }

Speculative Loop Fission

XBEGIN 5: node = node_array[IS]; i = 0; 1':while (node && i++ < CS) { 2: work(node); 3': node = node->next; } RECV(THREADj-1) XCOMMIT if (node!= node_array[IS+CS]){ update_node_array; kill_other_threads();} SEND(THREADj+1) } 1: while (node) { 4: node_array[count++] = node; 3: node = node->next; }

slide-33
SLIDE 33

33

University of Michigan Electrical Engineering and Com puter Science

Infrequent Dependence Isolation

1: 2:

9 9 % 1 % A B C

slide-34
SLIDE 34

34

University of Michigan Electrical Engineering and Com puter Science

Infrequent Dependence Isolation

1: 2: 1: 2:

9 9 % 1 % A B C A B C

slide-35
SLIDE 35

35

University of Michigan Electrical Engineering and Com puter Science

Infrequent Dependence Isolation

1: 2: 1: 2:

9 9 % 1 % A B C A B C’ C

slide-36
SLIDE 36

36

University of Michigan Electrical Engineering and Com puter Science

Infrequent Dependence Isolation

1: 2: 1: 2:

9 9 % 1 %

break

A B C A C’ C B 1 % 9 9 %

slide-37
SLIDE 37

37

University of Michigan Electrical Engineering and Com puter Science

for( j=0; j<=nstate; ++j ){ if( tystate[j] == 0 ) continue; if( tystate[j] == best ) continue; count = 0; cbest = tystate[j]; for (k=j; k<=nstate; ++k) if (tystate[k]==cbest) ++count; if ( count > times) { best = cbest; times = count; } }

Infrequent Dependence Isolation

Sample loop from yacc benchmark

slide-38
SLIDE 38

38

University of Michigan Electrical Engineering and Com puter Science

for( j=0; j<=nstate; ++j ){ if( tystate[j] == 0 ) continue; if( tystate[j] == best ) continue; count = 0; cbest = tystate[j]; for (k=j; k<=nstate; ++k) if (tystate[k]==cbest) ++count; if ( count > times) { best = cbest; times = count; } }

Infrequent Dependence Isolation

Sample loop from yacc benchmark

slide-39
SLIDE 39

39

University of Michigan Electrical Engineering and Com puter Science

for( j=0; j<=nstate; ++j ){ if( tystate[j] == 0 ) continue; if( tystate[j] == best ) continue; count = 0; cbest = tystate[j]; for (k=j; k<=nstate; ++k) if (tystate[k]==cbest) ++count; if ( count > times) { best = cbest; times = count; } }

Infrequent Dependence Isolation

if ( count > times) { best = cbest; times = count; }

1 %

Sample loop from yacc benchmark

slide-40
SLIDE 40

40

University of Michigan Electrical Engineering and Com puter Science

for( j=0; j<=nstate; ++j ){ if( tystate[j] == 0 ) continue; if( tystate[j] == best ) continue; count = 0; cbest = tystate[j]; for (k=j; k<=nstate; ++k) if (tystate[k]==cbest) ++count; if ( count > times) { best = cbest; times = count; } }

Infrequent Dependence Isolation

if ( count > times) { best = cbest; times = count; } j=0; while (j<=nstate){ for( ; j<=nstate; ++j ){ if( tystate[j] == 0 ) continue; if( tystate[j] == best ) continue; count = 0; cbest = tystate[j]; for (k=j; k<=nstate; ++k) if (tystate[k]==cbest) ++count; if ( count > times) break; } if (count > times) { best = cbest; times = count; j++; } }

1 % 1 %

Sample loop from yacc benchmark

slide-41
SLIDE 41

41

University of Michigan Electrical Engineering and Com puter Science

for( j=0; j<=nstate; ++j ){ if( tystate[j] == 0 ) continue; if( tystate[j] == best ) continue; count = 0; cbest = tystate[j]; for (k=j; k<=nstate; ++k) if (tystate[k]==cbest) ++count; if ( count > times) { best = cbest; times = count; } }

Infrequent Dependence Isolation

if ( count > times) { best = cbest; times = count; } j=0; while (j<=nstate){ for( ; j<=nstate; ++j ){ if( tystate[j] == 0 ) continue; if( tystate[j] == best ) continue; count = 0; cbest = tystate[j]; for (k=j; k<=nstate; ++k) if (tystate[k]==cbest) ++count; if ( count > times) break; } if (count > times) { best = cbest; times = count; j++; } }

1 % 1 %

Sample loop from yacc benchmark

slide-42
SLIDE 42

42

University of Michigan Electrical Engineering and Com puter Science

for( j=0; j<=nstate; ++j ){ if( tystate[j] == 0 ) continue; if( tystate[j] == best ) continue; count = 0; cbest = tystate[j]; for (k=j; k<=nstate; ++k) if (tystate[k]==cbest) ++count; if ( count > times) { best = cbest; times = count; } }

Infrequent Dependence Isolation

if ( count > times) { best = cbest; times = count; } j=0; while (j<=nstate){ for( ; j<=nstate; ++j ){ if( tystate[j] == 0 ) continue; if( tystate[j] == best ) continue; count = 0; cbest = tystate[j]; for (k=j; k<=nstate; ++k) if (tystate[k]==cbest) ++count; if ( count > times) break; } if (count > times) { best = cbest; times = count; j++; } }

1 % 1 % 1 %

Sample loop from yacc benchmark

slide-43
SLIDE 43

43

University of Michigan Electrical Engineering and Com puter Science

for( j=0; j<=nstate; ++j ){ if( tystate[j] == 0 ) continue; if( tystate[j] == best ) continue; count = 0; cbest = tystate[j]; for (k=j; k<=nstate; ++k) if (tystate[k]==cbest) ++count; if ( count > times) { best = cbest; times = count; } }

Infrequent Dependence Isolation

if ( count > times) { best = cbest; times = count; } j=0; while (j<=nstate){ for( ; j<=nstate; ++j ){ if( tystate[j] == 0 ) continue; if( tystate[j] == best ) continue; count = 0; cbest = tystate[j]; for (k=j; k<=nstate; ++k) if (tystate[k]==cbest) ++count; if ( count > times) break; } if (count > times) { best = cbest; times = count; j++; } }

1 % 1 % 1 %

Sample loop from yacc benchmark

slide-44
SLIDE 44

44

University of Michigan Electrical Engineering and Com puter Science

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 052.alvinn 056.ear 171.swim 172.m grid 177.m esa 179.art 183.equake 188.am m p 008.espresso 023.eqntot t 026.com press 072.sc 099.go 124.m 88ksim 129.com press 130.li 132.ijpeg 164.gzip 175.vpr 181.m cf 197.parser 256.bzip2 300.twolf cjpeg djpeg epic g721decode g721encode gsm decode gsm encode m peg2dec m peg2enc pegwit dec pegwit enc rawcaudio rawdaudio unepic grep lex yacc average SPEC FP SPEC INT Mediabench Ut ilit ies

Fraction of sequential execution profiled + provable

DOALL Coverage – Profiled and Transformed

slide-45
SLIDE 45

45

University of Michigan Electrical Engineering and Com puter Science

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 052.alvinn 056.ear 171.swim 172.m grid 177.m esa 179.art 183.equake 188.am m p 008.espresso 023.eqntot t 026.com press 072.sc 099.go 124.m 88ksim 129.com press 130.li 132.ijpeg 164.gzip 175.vpr 181.m cf 197.parser 256.bzip2 300.twolf cjpeg djpeg epic g721decode g721encode gsm decode gsm encode m peg2dec m peg2enc pegwit dec pegwit enc rawcaudio rawdaudio unepic grep lex yacc average SPEC FP SPEC INT Mediabench Ut ilit ies

Fraction of sequential execution profiled + provable

DOALL Coverage – Profiled and Transformed

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 052.alvinn 056.ear 171.swim 172.m grid 177.m esa 179.art 183.equake 188.am m p 008.espresso 023.eqnt ot t 026.com press 072.sc 099.go 124.m 88ksim 129.com press 130.li 132.ijpeg 164.gzip 175.vpr 181.m cf 197.parser 256.bzip2 300.t wolf cjpeg djpeg epic g721decode g721encode gsm decode gsm encode m peg2dec m peg2enc pegwit dec pegwit enc rawcaudio rawdaudio unepic grep lex yacc average SPEC FP SPEC INT Mediabench Ut ilit ies

Fraction of sequential execution profiled + provable transform ations

slide-46
SLIDE 46

46

University of Michigan Electrical Engineering and Com puter Science

Coverage Breakdown

10 20 30 40 50 60 70

SpecI NT MediaBench Utilities

Fraction of sequential execution

DOALL loops Control speculation for uncounted loops Speculative fission Speculative prem aterialization I nfrequent dependence isolation DOALL loops after transform ations

slide-47
SLIDE 47

47

University of Michigan Electrical Engineering and Com puter Science

Coverage Breakdown

10 20 30 40 50 60 70

SpecI NT MediaBench Utilities

Fraction of sequential execution

DOALL loops Control speculation for uncounted loops Speculative fission Speculative prem aterialization I nfrequent dependence isolation DOALL loops after transform ations

slide-48
SLIDE 48

48

University of Michigan Electrical Engineering and Com puter Science

Experimental Setup

  • OpenIMPACT compiler
  • Multicore simulator

– Simulates up to 8 ARM9-like processors – Models scalar operand network – Assumes perfect memory system – Uses STM library to emulate HTM functionality

slide-49
SLIDE 49

49

University of Michigan Electrical Engineering and Com puter Science

1 1 .5 2 2 .5 3 3 .5 4 4 .5 5 0 5 2 .alvinn 0 5 6 .ear 1 7 1 .sw im 1 7 2 .m grid 1 7 7 .m esa 1 7 9 .art 1 8 3 .equake 1 8 8 .am m p 0 0 8 .espresso 0 2 3 .eqntott 0 2 6 .com press 0 7 2 .sc 0 9 9 .go 1 2 4 .m 8 8 ksim 1 2 9 .com press 1 3 0 .li 1 3 2 .ijpeg 1 6 4 .gzip 1 7 5 .vpr 1 8 1 .m cf 1 9 7 .parser 2 5 6 .bzip2 3 0 0 .tw olf cjpeg djpeg epic g7 2 1 decode g7 2 1 encode gsm decode gsm encode m peg2 dec m peg2 enc pegw itdec pegw itenc raw caudio raw daudio unepic grep lex yacc average SPEC FP SPEC I NT Mediabench Utilities

Speedup

With transform ations Without transform ations

Speedup

2 core 4 core 8 core

7 .8 9 7 .3 7 7 .8 7 6 .4 4

slide-50
SLIDE 50

50

University of Michigan Electrical Engineering and Com puter Science

1 1 .5 2 2 .5 3 3 .5 4 4 .5 5 0 5 2 .alvinn 0 5 6 .ear 1 7 1 .sw im 1 7 2 .m grid 1 7 7 .m esa 1 7 9 .art 1 8 3 .equake 1 8 8 .am m p 0 0 8 .espresso 0 2 3 .eqntott 0 2 6 .com press 0 7 2 .sc 0 9 9 .go 1 2 4 .m 8 8 ksim 1 2 9 .com press 1 3 0 .li 1 3 2 .ijpeg 1 6 4 .gzip 1 7 5 .vpr 1 8 1 .m cf 1 9 7 .parser 2 5 6 .bzip2 3 0 0 .tw olf cjpeg djpeg epic g7 2 1 decode g7 2 1 encode gsm decode gsm encode m peg2 dec m peg2 enc pegw itdec pegw itenc raw caudio raw daudio unepic grep lex yacc average SPEC FP SPEC I NT Mediabench Utilities

Speedup

With transform ations Without transform ations

Speedup

2 core 4 core 8 core

7 .8 9 7 .3 7 7 .8 7 6 .4 4

1.36x, 1.84x and 2.34x speedup on 2-, 4-, and 8-cores

slide-51
SLIDE 51

51

University of Michigan Electrical Engineering and Com puter Science

Conclusion

  • Figure out ways to use available

resources for legacy applications

– Codes like error handlers, linked list & tree traversal limit parallelism

  • Compiler analysis and optimization

looks promising

  • 1.84x speedup on 4 cores after

transformations compared to 1.41x

slide-52
SLIDE 52

52

University of Michigan Electrical Engineering and Com puter Science

Questions?

Thank you!

slide-53
SLIDE 53

53

University of Michigan Electrical Engineering and Com puter Science

SpecDSWP vs. Speculative Fission

B

A

C

slide-54
SLIDE 54

54

University of Michigan Electrical Engineering and Com puter Science

SpecDSWP vs. Speculative Fission

B0

A0 A1 A2 A3

B1 B2 B3 C0 C1 C2 C3

Core 0 Core 1 Core 2 Core 3

B0

A0 A1 A2 A3

B1 B2 B3 C0 C1 C2 C3

Core 0 Core 1 Core 2 Core 3

slide-55
SLIDE 55

55

University of Michigan Electrical Engineering and Com puter Science

Speculative Prematerialization

for (...) { 1: current = ...; 2: work(last); 3: last = current; }

slide-56
SLIDE 56

56

University of Michigan Electrical Engineering and Com puter Science

Speculative Prematerialization

for (...) { 1: current = ...; 2: work(last); 3: last = current; } XBEGIN 1’: current = 3’: last = for (...) { 1: current = ...; 2: work(last); 3: last = current; } XCOMMIT