Hera-JVM: Abstracting Processor Heterogeneity Behind a Virtual - - PowerPoint PPT Presentation

hera jvm
SMART_READER_LITE
LIVE PREVIEW

Hera-JVM: Abstracting Processor Heterogeneity Behind a Virtual - - PowerPoint PPT Presentation

Hera-JVM: Abstracting Processor Heterogeneity Behind a Virtual Machine Ross McIlroy and Joe Sventek University of Glasgow Department of Computing Science Carnegie Trust for the Universities of Scotland Heterogeneous Multi-Core Architectures


slide-1
SLIDE 1

Hera-JVM:

Abstracting Processor Heterogeneity Behind a Virtual Machine

Ross McIlroy and Joe Sventek

University of Glasgow Department of Computing Science Carnegie Trust

for the Universities of Scotland

slide-2
SLIDE 2

Heterogeneous Multi-Core Architectures

  • CPUs are becoming increasingly Multi-Core
  • Should these cores all be identical?
  • Specialise cores for particular workloads
  • Large core for sequential code, many small cores for

parallel code

  • Found in specialist niches currently
  • e.g. network processors (Intel IXP), games consoles (Cell)
  • Likely to become more common
  • On-chip GPUs (AMD Fusion), Intel Larrabee
slide-3
SLIDE 3

Developing for HMAs

Application Threads

slide-4
SLIDE 4

Developing for HMAs

Main Arch Code Secondary Arch Code Application Threads

slide-5
SLIDE 5

Developing for HMAs

Main Core Secondary Cores

Main Arch Code Secondary Arch Code

slide-6
SLIDE 6

Developing for HMAs

Main Core Secondary Cores

Main Arch Code Secondary Arch Code Support Code

slide-7
SLIDE 7

Developing for HMAs

Main Core Secondary Cores

Main Arch Code Secondary Arch Code Support Code

slide-8
SLIDE 8

Developing for HMAs

Main Core Secondary Cores

Main Arch Code Secondary Arch Code Support Code

slide-9
SLIDE 9

Developing for HMAs

Main Core Secondary Cores

Main Arch Code Secondary Arch Code Support Code Libraries

main.o secondary.o

slide-10
SLIDE 10

Hera-JVM

  • Hide this heterogeneity from the application developer
  • Present the illusion of a homogeneous multi-threaded virtual machine
  • The same code will run on either core type
  • Runtime system is aware of heterogeneous resources
  • Can transparently migrate threads between core types based upon

this knowledge

  • Provide portable application behaviour hints to enable

runtime system to infer the application’s heterogeneity

  • Explicit Code Annotations
  • Static Code Analysis / Typing information
  • Runtime Monitoring / Profiling
slide-11
SLIDE 11

Developing for Hera-JVM

Main Core Secondary Cores

Application Threads

slide-12
SLIDE 12

Developing for Hera-JVM

Main Core Secondary Cores

Integer Float Random Memory Access Branching Code Sequential Memory Access Application Threads

slide-13
SLIDE 13

Developing for Hera-JVM

Main Core Secondary Cores

Integer Float Random Memory Access Branching Code Sequential Memory Access

Runtime System

Int, Float, Seq Rand Rand Int, Float Main Core Costs

  • Sec. Core

Costs Application Threads

slide-14
SLIDE 14

Developing for Hera-JVM

Main Core Secondary Cores

Integer Float Random Memory Access Branching Code Sequential Memory Access

Runtime System

Int, Float, Seq Rand Rand Int, Float Main Core Costs

  • Sec. Core

Costs Application Threads

slide-15
SLIDE 15

Developing for Hera-JVM

Main Core Secondary Cores

Integer Float Random Memory Access Branching Code Sequential Memory Access

Runtime System

Int, Float, Seq Rand Rand Int, Float Main Core Costs

  • Sec. Core

Costs Application Threads

slide-16
SLIDE 16

Developing for Hera-JVM

Main Core Secondary Cores

Integer Float Random Memory Access Branching Code Sequential Memory Access

Runtime System

Int, Float, Seq Rand Rand Int, Float Main Core Costs

  • Sec. Core

Costs Application Threads

slide-17
SLIDE 17

Developing for Hera-JVM

Main Core Secondary Cores

Integer Float Random Memory Access Branching Code Sequential Memory Access

Runtime System

Int, Float, Seq Rand Rand Int, Float Main Core Costs

  • Sec. Core

Costs Application Threads

slide-18
SLIDE 18

Developing for Hera-JVM

Main Core Secondary Cores

Integer Float Random Memory Access Branching Code Sequential Memory Access

Runtime System

Int, Float, Seq Rand Rand Int, Float Main Core Costs

  • Sec. Core

Costs Application Threads

slide-19
SLIDE 19

Developing for Hera-JVM

Main Core Secondary Cores

Integer Float Random Memory Access Branching Code Sequential Memory Access

Runtime System

Int, Float, Seq Rand Rand Int, Float Main Core Costs

  • Sec. Core

Costs Application Threads

slide-20
SLIDE 20

Developing for Hera-JVM

Main Core Secondary Cores

Integer Float Random Memory Access Branching Code Sequential Memory Access

Runtime System

Int, Float, Seq Rand Rand Int, Float Main Core Costs

  • Sec. Core

Costs Application Threads

slide-21
SLIDE 21

Cell Processor

slide-22
SLIDE 22

A JVM for Two Architectures

  • Built upon JikesRVM
  • Java in Java
  • PowerPC and x86 support
slide-23
SLIDE 23

A JVM for Two Architectures

  • Built upon JikesRVM
  • Java in Java
  • PowerPC and x86 support

PPE Assembler Low Level Assembly PPE Compiler Runtime System Java Library Application

slide-24
SLIDE 24

A JVM for Two Architectures

  • Built upon JikesRVM
  • Java in Java
  • PowerPC and x86 support

PPE Assembler Low Level Assembly PPE Compiler SPE Assembler Runtime System Java Library Application

slide-25
SLIDE 25

A JVM for Two Architectures

  • Built upon JikesRVM
  • Java in Java
  • PowerPC and x86 support

PPE Assembler Low Level Assembly PPE Compiler SPE Assembler Low Level Assembly Runtime System Java Library Application

slide-26
SLIDE 26

A JVM for Two Architectures

  • Built upon JikesRVM
  • Java in Java
  • PowerPC and x86 support

PPE Assembler Low Level Assembly PPE Compiler SPE Assembler Low Level Assembly SPE Compiler Runtime System Java Library Application

slide-27
SLIDE 27

A JVM for Two Architectures

  • Built upon JikesRVM
  • Java in Java
  • PowerPC and x86 support

PPE Assembler Low Level Assembly PPE Compiler SPE Assembler Low Level Assembly SPE Compiler Runtime System Java Library Application

slide-28
SLIDE 28

A JVM for Two Architectures

  • Built upon JikesRVM
  • Java in Java
  • PowerPC and x86 support

PPE Assembler Low Level Assembly PPE Compiler SPE Assembler Low Level Assembly SPE Compiler Runtime System Java Library Application

slide-29
SLIDE 29

A JVM for Two Architectures

  • Built upon JikesRVM
  • Java in Java
  • PowerPC and x86 support

PPE Assembler Low Level Assembly PPE Compiler SPE Assembler Low Level Assembly SPE Compiler Runtime System Java Library Application

slide-30
SLIDE 30

A JVM for Two Architectures

  • Built upon JikesRVM
  • Java in Java
  • PowerPC and x86 support

PPE Assembler Low Level Assembly PPE Compiler SPE Assembler Low Level Assembly SPE Compiler Runtime System Java Library Application

slide-31
SLIDE 31

Migration

  • A thread can migrate between the PPE and SPE

cores at any method invocation

  • Migration is triggered either by an explicit annotation or is

signalled dynamically by the scheduler

  • Syscalls and native methods always migrate back to PPE
  • Migration from core type A to B:
  • Thread “traps” to support code on core A, which saves arguments
  • Method JITed for core type B if required
  • Migration marker and migration support frame pushed onto stack
  • Thread placed on ready queue of core type B
slide-32
SLIDE 32

SPE Local Memory

  • Instead of a cache, SPEs have 256KB of explicitly

accessible local memory

  • Main memory accessed through DMA using MFC

(Memory Flow Controller)

  • Setting up many small DMA transfers is costly

Main Memory Local Memory MFC SPE

slide-33
SLIDE 33

Software Caching in a High Level Language

  • Java bytecodes are typed, therefore, we have

high level knowledge of what’s being cached

  • Cache an object completely when it is accessed
  • Cache arrays in 1KB blocks
  • Java memory model only requires coherency
  • perations at synchronisation points
  • Methods are cached in their entirety when

invoked

slide-34
SLIDE 34

Hera-JVM Performance

!" !#$" %" %#$" &" ! " # ! $ % & ' ( ) * + ,

  • %

$ . & ! " # ! / ( % ! + & ! " # , & ! " # " $ ) . + 1 ( % , $ & ' $ , 2 3 ) & ! " # 4 . & ' / + 5 6 * 7 $ & 8 ( 3 & 9 % ( 1 + % & ' $ ) . + : ( % , $ & : $ " / % + ! ! &

Single Threaded

SPE v.s. PPE Speedup

slide-35
SLIDE 35

Hera-JVM Performance

!" !#$" %" %#$" &" ! " # ! $ % & ' ( ) * + ,

  • %

$ . & ! " # ! / ( % ! + & ! " # , & ! " # " $ ) . + 1 ( % , $ & ' $ , 2 3 ) & ! " # 4 . & ' / + 5 6 * 7 $ & 8 ( 3 & 9 % ( 1 + % & ' $ ) . + : ( % , $ & : $ " / % + ! ! &

Single Threaded

SPE v.s. PPE Speedup

slide-36
SLIDE 36

!" #" $" %" &" '!" '#" ( ) * ( + , "

  • .

/ 1 2 3 , + 4 " ( ) * ( 5 . , ( 1 " ( ) * 2 6 " ( ) * ) + / 4 1 7 . , 2 + "

  • +

2 8 9 / " ( ) * : 4 "

  • 5

1 ; < 6 = + " > . 9 " ? , . 7 1 , "

  • +

/ 4 1 @ . , 2 + " @ + ) 5 , 1 ( ( "

Hera-JVM Performance

Multi-Threaded (6 threads)

6 SPEs v.s. PPE Speedup

slide-37
SLIDE 37

Proportion of Execution Time by Operation

!"# $!"# %!"# &!"# '!"# (!!"#

)*+,-.//# +,.01234*# +153.67-*8#

96*1:50#;*458# <58.0.-# =-15)># ?81)@# A*)16#B.+*-C# B145#B.+*-C#

slide-38
SLIDE 38

Data Cache Hit-Rate

!"#$ !"%$ !"&$ !"'$ !"($ !")$ !"*$ +$ +"+$ !$ )$ +'$ ,%$ #,$ %!$ %)$ &'$ '%$ (,$ )!$ ))$ *'$ +!%$ !"#$%#&'()"**************** +#",'-."*/%*0123*4"$'5,/6* 7'/'*8')9"*:;<"*+236* !"#$ !"#%$ !"&$ !"&%$ !"'$ !"'%$ ($ !"#"$%&#$'"#($ )*+,-.//$ +,.01234*$ +153.67-*8$

slide-39
SLIDE 39

Code Cache Hit-Rate

!"#$ !"%$ !"&$ !"'$ ($ ("($ !$ &$ (#$ )*$ +)$ *!$ *&$ ,#$ #*$ %)$ &!$ &&$ !"#$%#&'()"****************** +#",'-."*/%*0123*4"$'5,/6* 7%4"*7')8"*9:;"*+236* !"#$ !"%$ !"&$ !"'$ ($ !"#$%&'()#'*+#"' )*+,-.//$ +,.01234*$ +153.67-*8$

slide-40
SLIDE 40

Conclusion / Future Work

  • Architectures are likely to become more heterogeneous
  • This heterogeneity should be taken out of the hands of

non-specialist programmers

  • Instead, hide this heterogeneity from the programmer and

provide abstractions to infer a program’s heterogeneity

  • E.g. code annotations, runtime monitoring, etc.
  • Hera-JVM is a proof of concept of this approach
  • Overheads involved in hiding the heterogeneity are tolerable for

most applications

  • Next Stage : Fully integrate behaviour tagging with

scheduling / migration decisions