Speculative Multithreading in a Java Virtual Machine Chris Pickett - - PowerPoint PPT Presentation

speculative multithreading in a java virtual machine
SMART_READER_LITE
LIVE PREVIEW

Speculative Multithreading in a Java Virtual Machine Chris Pickett - - PowerPoint PPT Presentation

Speculative Multithreading in a Java Virtual Machine Chris Pickett and Clark Verbrugge School of Computer Science McGill University May 17, 2005 Outline 1 Introduction 2 Design 3 Experimental Analysis 4 Conclusions and Future Work Outline 1


slide-1
SLIDE 1

Speculative Multithreading in a Java Virtual Machine

Chris Pickett and Clark Verbrugge School of Computer Science McGill University May 17, 2005

slide-2
SLIDE 2

Outline

1 Introduction 2 Design 3 Experimental Analysis 4 Conclusions and Future Work

slide-3
SLIDE 3

Outline

1 Introduction 2 Design 3 Experimental Analysis 4 Conclusions and Future Work

slide-4
SLIDE 4

Motivation

Speculative multithreading (SpMT) has great promise:

Dynamic parallelisation of irregular, non-numerical programs Good potential for speedup in Java (1.5 to 5.0 over SPECjvm98 on a simulated 8-way machine).

Simulated hardware is the primary target; software SpMT is rare. Not being hardware people, we wanted to try our hand at a software implementation. The Java Virtual Machine provides a convenient hardware abstraction layer. Decided to use SableVM, our lab’s free/open source JVM.

slide-5
SLIDE 5

Speculative Method-Level Parallelism (SMLP)

slide-6
SLIDE 6

Contributions

1 First complete implementation of (SMLP-based) SpMT in a (Java)

virtual machine (SableVM)

2 Ability to run SPECjvm98 at size 100 3 Single-threaded simulation and true multithreaded execution modes 4 Experimental analysis of overhead costs and parallelism achieved

Unfortunately, no speedup :(

slide-7
SLIDE 7

Outline

1 Introduction 2 Design 3 Experimental Analysis 4 Conclusions and Future Work

slide-8
SLIDE 8

Execution Environment

slide-9
SLIDE 9

Parallel Instruction Code Arrays

slide-10
SLIDE 10

Modified Java Bytecode Instructions

instruction reads writes locks unlocks allocates throws enters loads forces global global

  • bject
  • bject
  • bject

exception native code classes stop

slide-11
SLIDE 11

Modified Java Bytecode Instructions

instruction reads writes locks unlocks allocates throws enters loads forces global global

  • bject
  • bject
  • bject

exception native code classes stop GETFIELD always sometimes first time sometimes GETSTATIC always first time first time <X>ALOAD always sometimes sometimes

slide-12
SLIDE 12

Modified Java Bytecode Instructions

instruction reads writes locks unlocks allocates throws enters loads forces global global

  • bject
  • bject
  • bject

exception native code classes stop GETFIELD always sometimes first time sometimes GETSTATIC always first time first time <X>ALOAD always sometimes sometimes PUTFIELD always sometimes first time sometimes PUTSTATIC always first time first time <X>ASTORE always sometimes sometimes

slide-13
SLIDE 13

Modified Java Bytecode Instructions

instruction reads writes locks unlocks allocates throws enters loads forces global global

  • bject
  • bject
  • bject

exception native code classes stop GETFIELD always sometimes first time sometimes GETSTATIC always first time first time <X>ALOAD always sometimes sometimes PUTFIELD always sometimes first time sometimes PUTSTATIC always first time first time <X>ASTORE always sometimes sometimes (I|L)(DIV|REM) sometimes sometimes ARRAYLENGTH sometimes sometimes CHECKCAST sometimes first time sometimes ATHROW always always INSTANCEOF first time sometimes

slide-14
SLIDE 14

Modified Java Bytecode Instructions

instruction reads writes locks unlocks allocates throws enters loads forces global global

  • bject
  • bject
  • bject

exception native code classes stop GETFIELD always sometimes first time sometimes GETSTATIC always first time first time <X>ALOAD always sometimes sometimes PUTFIELD always sometimes first time sometimes PUTSTATIC always first time first time <X>ASTORE always sometimes sometimes (I|L)(DIV|REM) sometimes sometimes ARRAYLENGTH sometimes sometimes CHECKCAST sometimes first time sometimes ATHROW always always INSTANCEOF first time sometimes RET sometimes

slide-15
SLIDE 15

Modified Java Bytecode Instructions

instruction reads writes locks unlocks allocates throws enters loads forces global global

  • bject
  • bject
  • bject

exception native code classes stop GETFIELD always sometimes first time sometimes GETSTATIC always first time first time <X>ALOAD always sometimes sometimes PUTFIELD always sometimes first time sometimes PUTSTATIC always first time first time <X>ASTORE always sometimes sometimes (I|L)(DIV|REM) sometimes sometimes ARRAYLENGTH sometimes sometimes CHECKCAST sometimes first time sometimes ATHROW always always INSTANCEOF first time sometimes RET sometimes MONITORENTER always always always sometimes always MONITOREXIT always always always sometimes always

slide-16
SLIDE 16

Modified Java Bytecode Instructions

instruction reads writes locks unlocks allocates throws enters loads forces global global

  • bject
  • bject
  • bject

exception native code classes stop GETFIELD always sometimes first time sometimes GETSTATIC always first time first time <X>ALOAD always sometimes sometimes PUTFIELD always sometimes first time sometimes PUTSTATIC always first time first time <X>ASTORE always sometimes sometimes (I|L)(DIV|REM) sometimes sometimes ARRAYLENGTH sometimes sometimes CHECKCAST sometimes first time sometimes ATHROW always always INSTANCEOF first time sometimes RET sometimes MONITORENTER always always always sometimes always MONITOREXIT always always always sometimes always INVOKE<X> sometimes sometimes sometimes sometimes sometimes first time sometimes <X>RETURN sometimes sometimes sometimes sometimes sometimes first time sometimes

slide-17
SLIDE 17

Modified Java Bytecode Instructions

instruction reads writes locks unlocks allocates throws enters loads forces global global

  • bject
  • bject
  • bject

exception native code classes stop GETFIELD always sometimes first time sometimes GETSTATIC always first time first time <X>ALOAD always sometimes sometimes PUTFIELD always sometimes first time sometimes PUTSTATIC always first time first time <X>ASTORE always sometimes sometimes (I|L)(DIV|REM) sometimes sometimes ARRAYLENGTH sometimes sometimes CHECKCAST sometimes first time sometimes ATHROW always always INSTANCEOF first time sometimes RET sometimes MONITORENTER always always always sometimes always MONITOREXIT always always always sometimes always INVOKE<X> sometimes sometimes sometimes sometimes sometimes first time sometimes <X>RETURN sometimes sometimes sometimes sometimes sometimes first time sometimes NEW always always sometimes first time sometimes NEWARRAY always always sometimes sometimes ANEWARRAY always always sometimes first time sometimes MULTIANEWARRAY always always sometimes first time sometimes LDC STRING first time first time

slide-18
SLIDE 18

Fork Decision Factors

Child threads are forked taking several factors into account.

1 Static upper bound on method size 2 Dynamic min, max, and average method sizes 3 History of speculation successes and failures 4 History of sequence lengths 5 Number of zero length threads joined 6 Forced stop due to reaching another child (“elder sibling”)

slide-19
SLIDE 19

Forking Speculative Threads

The actual fork process consists of several steps:

1 Copy thread JNIEnv from parent to child 2 Copy parent stack to child 3 Initialize dependence buffer 4 Adjust child’s operand stack height 5 Jump child pc over the INVOKE<X> 6 (optional) Predict return value for non-void methods

slide-20
SLIDE 20

Dependence Buffering

slide-21
SLIDE 21

Joining Speculative Threads

Every SpMT child eventually reaches one of four termination conditions:

1

A pre-defined sequence length limit is reached

2

The parent thread reaches SPMT JOIN and signals the child

3

The parent thread throws an uncaught exception, and signals the child

4

Unsafe control flow is encountered

Once stopped, we begin the validation process.

slide-22
SLIDE 22

Joining Speculative Threads

Validation consists of 4 steps

1

Verify return value (if any)

2

Check number of GC’s in child

3

Dependence buffers checked for corruption, overflow

4

Values in read buffer compared with main memory

If validation succeeds, then:

Values in write buffer are flushed to main memory Child stack frames are copied to parent Non-speculative execution resumes where the child left off

Otherwise, the child is aborted.

slide-23
SLIDE 23

Single-threaded Simulation Mode

slide-24
SLIDE 24

Multithreaded Mode

slide-25
SLIDE 25

Intricacies of the Java Language

There are four Java-specific problems:

1

Native methods

2

Garbage collection

3

Exceptions

4

Synchronization

slide-26
SLIDE 26

Native Methods

Java allows for execution of non-Java, i.e. native code Native methods can be found in:

Class libraries User code VM-specific method implementations

Native methods are needed for (amongst other things):

Thread management Timing All I/O operations

Safe to fork children if parents encounter native methods Unsafe for children to enter native code

slide-27
SLIDE 27

Garbage Collection

SableVM uses a simple semi-space copying collector Child threads started before GC are invalidated after GC

Could be fixed by pinning objects, or by updating references in the dependence buffers.

Child threads are invisible to the collector, and can continue execution during GC. We are able to allocate objects speculatively

Heap is protected by global mutex Instead of OutOfMemoryError, speculation stops Disadvantage is increased collector pressure from failed threads

slide-28
SLIDE 28

Exceptions

Speculatively, exceptions force threads to stop immediately

Exceptions are rarely encountered Writing a speculative exception handler is tricky Speculative exceptions are likely to be incorrect

Non-speculatively, exceptions can be thrown and caught in the parent

If uncaught, children are aborted one-by-one as stack frames are popped

Since method calls frequently occur in exception handlers, we might expect to fork children inside them.

This is safe!

slide-29
SLIDE 29

Synchronization

Java allows for synchronization on a per-method and per-object basis Safe non-speculatively, unsafe speculatively

However, we can start child threads once inside a critical section;

  • nly entering and exiting is prohibited

Speculative Locking allows for critical sections to be entered and exited speculatively

We’ll look into this in the future

slide-30
SLIDE 30

Outline

1 Introduction 2 Design 3 Experimental Analysis 4 Conclusions and Future Work

slide-31
SLIDE 31

Speculation Overhead

slide-32
SLIDE 32

Non-speculative Thread Overhead Breakdown

execution comp db jack javac jess mpeg mtrt rt bytecode 39% 24% 29% 30% 21% 59% 49% 58% fork 6% 15% 13% 13% 11% 5% 3% 4% enqueue 4% 10% 10% 9% 7% 3% 2% 2% join 53% 59% 57% 56% 67% 34% 47% 36% pred update 7% 13% 12% 11% 12% 6% 7% 7% dequeue 5% 5% 5% 4% 5% 2% 2% 2% wait 15% 14% 11% 11% 19% 8% 26% 11% pred check 4% 4% 4% 5% 7% 3% 2% 3% buffer check 4% 6% 6% 5% 5% 3% 1% 2% child pass 5% 5% 7% 6% 6% 3% 2% 3% child fail <1% <1% <1% <1% <1% <1% <1% <1% cleanup <1% <1% <1% <1% <1% <1% <1% <1%

slide-33
SLIDE 33

Speculative Thread Overhead Breakdown

execution comp db jack javac jess mpeg mtrt rt child wait 86% 82% 78% 78% 78% 55% 53% 71% child init 3% 4% 4% 4% 4% 2% 5% 4% child run 9% 12% 16% 16% 17% 41% 40% 24% child cleanup <1% <1% <1% <1% <1% <1% <1% <1% bytecode 58% 50% 65% 64% 57% 83% 51% 56% fork 35% 40% 28% 29% 36% 13% 41% 36% pred query 33% 38% 25% 26% 33% 11% 38% 33% join 2% 2% 2% 2% 2% 1% 2% 2%

slide-34
SLIDE 34

Speculative Thread Sizes (single-threaded simulation)

  • 100
  • 90
  • 80
  • 70
  • 60
  • 50
  • 40
  • 30
  • 20
  • 10

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Percentage of speculative threads Speculative thread size in Java bytecode instructions Passed Threads Failed Threads

slide-35
SLIDE 35

Speculative Thread Sizes (multithreaded mode)

  • 100
  • 90
  • 80
  • 70
  • 60
  • 50
  • 40
  • 30
  • 20
  • 10

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Percentage of speculative threads Speculative thread size in Java bytecode instructions Passed Threads Failed Threads

slide-36
SLIDE 36

Speculative Coverage (no RVP)

10 20 30 40 50 60 70 80 90 100 1 2 3 4 bytecode instructions executed speculatively in parallel (%) number of processors compress db jack javac jess mpegaudio mtrt raytrace

slide-37
SLIDE 37

Speculative Coverage (with RVP)

10 20 30 40 50 60 70 80 90 100 1 2 3 4 instructions executed in parallel (%) number of processors compress db jack javac jess mpegaudio mtrt raytrace

slide-38
SLIDE 38

Outline

1 Introduction 2 Design 3 Experimental Analysis 4 Conclusions and Future Work

slide-39
SLIDE 39

Conclusions

Automatic parellisation is a difficult goal We provide a complete design and working implementation The full Java language is handled Overhead costs show where to focus optimisation efforts We showed an increase in parallelism as:

Processors are added Return value prediction is added

SableSpMT is a good base for future research

slide-40
SLIDE 40

Future Work in SableVM

Eliminate overhead costs Implement speculative locking Look at processor scalability Allow for children to fork children Load value prediction Compiler analyses

Speculative dependences Finding good fork points

Clarify memory model issues Compare sequential algorithms running under SpMT against their hand-parallelised equivalents (start with JOlden).

slide-41
SLIDE 41

Future Work in Testarossa and J9

Implement this design in IBM’s Testarossa JIT / J9 VM Initial target is PPC (Power4, Power5) Assembly versions of dependence buffer and value predictors Some mechanism to switch between non-speculative and speculative code (e.g. on-stack replacement) Goal is speedup and an (eventual) PLDI or OOPSLA paper