Marawacc: A Framework for Heterogeneous Computing in Java - PowerPoint PPT Presentation

Marawacc Marawacc: A Framework for Heterogeneous Computing in Java Motivation Marawacc-API Runtime Code Generation Juan Fumero, Michel Steuwer, Christophe Dubach Runtime Management Results Conclusion The University of Edinburgh UK Many-Core Developer Conference 2016 1 / 23

Motivation Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion 2 / 23

Marawacc: our approach Marawacc Three levels of abstraction Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion 6 / 23

Marawacc API Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion 7 / 23

Example: Saxpy in Java Marawacc Motivation Marawacc-API Runtime Code Generation Runtime f l o a t [ ] v1 = new f l o a t [ s i z e ] ; Management 1 f l o a t [ ] v2 = new f l o a t [ s i z e ] ; 2 Results f l o a t [ ] r e s u l t = new f l o a t [ s i z e ] ; 3 Conclusion 4 5 f o r ( i n t i = 0; i < s i z e ; i ++) { r e s u l t [ i ] = alpha ∗ v1 [ i ] + v2 [ i ] ; 6 7 } 8 / 23

Example: Saxpy in Java Marawacc Motivation Marawacc-API Runtime Code Generation Runtime 1 F l o at [ ] v1 = new F l o a t [ s i z e ] ; Management 2 F l o at [ ] v2 = new F l o a t [ s i z e ] ; Results 3 Conclusion 4 ArrayFunc < Tuple2 < Float , Float > , Float > f ; f = new MapFunction <> (t − > alpha ∗ t . 1 () + t . 2 () ) ; 5 6 7 F l o at [ ] r e s u l t = f . z i p ( v1 , v2 ) . apply () ; 9 / 23

Runtime Code Generation Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion 10 / 23

Runtime Code Generation Marawacc Workflow Motivation Marawacc-API Graal VM Java source Runtime Code 1. Type inference Map.apply(f) Generation Runtime 2. IR generation Management ... Results 10: aload_2 11: iload_3 Conclusion 12: aload_0 3. optimizations 13: get fi eld 16: aaload 18: invokeinterface#apply 23: aastore 24: iinc 27: iload_3 CFG + Data fl ow ... (Graal IR) Java bytecode 4. kernel generation void kernel ( global fl oat* input, global fl oat* output) { ...; ...; } OpenCL Kernel 11 / 23

Runtime Code Generation Marawacc Motivation MapFunction < Integer , Double > (x − > x * 2.0) Marawacc-API Runtime Code Generation Runtime Param Param Management Param StartNode Results StartNode IsNull StartNode MethodCallTarget Conclusion DoubleConvert Const (2.0) GuardingPi (NullCheckException) Invoke#Integer.intValue * Unbox DoubleConvert Const (2.0) DoubleConvert Const (2.0) * Return * MethodCallTarget Box Invoke#Double.valueOf inline double lambda0 ( int p0 ) { double cast_1 = ( double ) p0 ; Return double result_2 = cast_1 * 2.0; Return return result_2 ; } 12 / 23

Marawacc: Runtime Management Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion 13 / 23

Where the time is spent? Marawacc Motivation Black-scholes benchmark. Marawacc-API Runtime Code Float [] = ⇒ Tuple 2 < Float , Float > [] Generation Runtime Management Results 1.0 Conclusion Unmarshaling ◮ Un/marshal data Amount of total runtime in % 0.8 CopyToCPU takes up to 90% of the time 0.6 GPU Execution ◮ Computation step CopyToGPU 0.4 should be dominant Marshaling 0.2 Java overhead 0.0 This is not acceptable. Can we do better? 14 / 23

Custom Array Type: PArray Marawacc Motivation PArray<T uple2<Float,Double>> Marawacc-API 0 1 2 n-1 Runtime Code Generation T uple2 T uple2 T uple2 T uple2 Runtime fl oat fl oat fl oat ... fl oat Programmer's View Management double double double double Results Conclusion Graal-OCL VM 2 0 1 n-1 FloatBu ff er fl oat fl oat fl oat ... ... ... fl oat 2 n-1 0 1 DoubleBu ff er double double double ... ... double With this layout, un/marshal operations are not necessary 15 / 23

Sapy example Marawacc Motivation Marawacc-API Runtime Code Generation Runtime 1 F l o at [ ] v1 = new F l o a t [ s i z e ] ; Management 2 Double [ ] v2 = new Double [ s i z e ] ; Results 3 Conclusion 4 ArrayFunc < Tuple2 < Float , Double > , Double > f ; f = new MapFunction <> (t − > alpha ∗ t . 1 () + t . 2 () ) ; 5 6 7 F l o at [ ] r e s u l t = f . z i p ( v1 , v2 ) . apply () ; 16 / 23

Saxpy with our Custom PArrays Marawacc Motivation Marawacc-API Runtime Code Generation Runtime 1 F l o at [ ] v1 = new F l o a t [ s i z e ] ; Management 2 Double [ ] v2 = new Double [ s i z e ] ; Results PArray i n p u t= new PArray ( v1 , v2 ) ; 3 Conclusion 4 5 ArrayFunc < Tuple2 < Float , Double > , Double > f ; 6 f = new MapFunction <> (t − > alpha ∗ t . 1 () + t . 2 () ) ; 7 PArray < Double > output = f . apply ( i n p u t ) ; 17 / 23

Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Results Conclusion 18 / 23

OpenCL GPU Execution Marawacc AMD R9 and NVIDIA GeForce GTX Titan Motivation Marawacc-API AMD Marshalling AMD Marshalling AMD Optimized AMD Optimized Runtime Code Speedup vs. Java sequential 1000 Generation 100 Runtime Management 10 Results 1 Conclusion 0.1 small large small large small large small large small large Saxpy K−Means Black−Scholes N−Body Monte Carlo Nvidia Marshalling Nvidia Marshalling Nvidia Optimized Nvidia Optimized Speedup vs. Java sequential 1000 100 10 1 0.1 small large small large small large small large small large Saxpy K−Means Black−Scholes N−Body Monte Carlo 19 / 23

Comparison with OpenCL C++ Marawacc AMD R9 and NVIDIA GeForce GTX Titan Motivation Marawacc-API Runtime Code Generation Speedup over sequential code on AMD Runtime Marawacc Aparapi OpenCL C++ 1000 Speedup over sequential code Management 500 100 Results Conclusion 10 1 Small Large Small Large Small Large Small Large Small Large Saxpy K-Means Black-Scholes N-Body MonteCarlo Speedup over sequential code on NVIDIA Marawacc Aparapi OpenCL C++ Speedup over sequential code 500 100 10 1 Small Large Small Large Small Large Small Large Small Large Saxpy K-Means Black-Scholes N-Body MonteCarlo 20 / 23

.zip(Conclusions).map(Future) Marawacc Motivation Marawacc-API Present Runtime Code ◮ We have presented Marawacc framework for Generation Runtime programming GPUs from Java Management ◮ Custom array type to reduce overheads when Results transforming the data Conclusion ◮ Runtime system to run heterogeneous applications within Java Future ◮ Code generation for multiple devices ◮ Runtime scheduling (Where is the best place to run the code?) 21 / 23

Thanks so much for your attention Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion Juan Fumero <juan.fumero@ed.ac.uk> 22 / 23

OpenCL code generated Marawacc lambda0 ( f l o a t p0 ) { 1 double Motivation double c a s t 1 = ( double ) p0 ; 2 Marawacc-API r e s u l t 2 = c a s t 1 ∗ 2 . 0 ; double 3 return r e s u l t 2 ; Runtime Code 4 Generation 5 } 6 k e r n e l void lambdaComputationKernel ( Runtime Management g l o b a l f l o a t ∗ p0 , 7 g l o b a l i n t ∗ p0 index data , 8 Results g l o b a l double ∗ p1 , 9 Conclusion g l o b a l ∗ p 1 i n d e x d a t a ) { i n t 10 i n t p0 dim 1 = 0 ; i n t p1 dim 1 = 0 ; 11 gs = g e t g l o b a l s i z e (0) ; i n t 12 i n t lo op 1 = g e t g l o b a l i d (0) ; 13 ( ; ; l oop 1 += gs ) { f o r 14 i n t p 0 l e n d i m 1 = p 0 i n d e x d a t a [ p0 dim 1 ] ; 15 bool cond 2 = l oo p 1 < p 0 l e n d i m 1 ; 16 i f ( cond 2 ) { 17 auxVar0 = p0 [ l oo p 1 ] ; f l o a t 18 double r e s = lambd0 ( auxVar0 ) ; 19 p1 [ p 1 i n d e x d a t a [ p1 dim 1 + 1] + l oo p 1 ] 20 = r e s ; 21 } e l s e { break ; } 22 } 23 24 } 23 / 23

Marawacc: A Framework for Heterogeneous Computing in Java - PowerPoint PPT Presentation

Marawacc Marawacc: A Framework for Heterogeneous Computing in Java Motivation Marawacc-API Runtime Code Generation Juan Fumero, Michel Steuwer, Christophe Dubach Runtime Management Results Conclusion The University of Edinburgh UK

Migrating to Java 9 Modules @Sander_Mak By Sander Mak Migrating to Java 9 Java 8 java -cp ..

JAVA Java vs. Java Java Language Specification

Java Comes Home to the Consumer Chet Haase Java SE Client Architect Java Comes Home to the

Multi-core in JVM/Java Concurrent programming in java Prior Java 5 Java 5 (2006)

Java Java Basics Java Program Statements Java Review Conditional statements

DTrace Topics: -> java/lang/System.arraycopy <- java/lang/System.arraycopy Java <-

How Java works The java compiler takes a .java file and generates a .class file The .class

OpenJDK The Future of Open Source Java on GNU/Linux Dalibor Topi Java F/OSS Ambassador

Coverage in Heterogeneous Coverage in Heterogeneous Networks Xiaoli Chu King s College

The testing pyramid Maurcio F. Aniche M.F.Aniche@tudelft.nl A.java ATest.java Thats what

Upgrading Past Java 9 Sounds Scary and I dont want to pay for Java Super happy with Java 8,

Philly Java Users Group Whats new in Whats new in Java 2 Standard Edition 1.4 Java 2

A Simple Java Code Generator for ACL2 Based on a Deep Embedding of ACL2 in Java Alessandro

Java: An Operational Java: An Operational Semantics Semantics Gaurav S. S. Kc Kc Gaurav B.

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

Optimizing Enterprise Java for a Microservices Architecture Otvio Santana @otaviojava

PtrSplit: Supporting General Pointers in Automatic Program Partitioning Shen Liu Gang Tan

Interprocess Communication (IPC) The characteristics of protocols for communication between

Accelerating Large Charm++ Messages using RDMA Nitin Bhat, Vipul Harsh Nitin Bhat Masters

4 Our mission to Mars Alexander Nyen ! itemis AG ! Graphical Editing Framework Project Lead

Secure Implementations for Typed Session Abstractions Ricardo Corin, Pierre-Malo Denilou,

COM Interop & P/Invoke .NET and the Old World Objectives Introduction to interoperation

Applying Geometric Thick Paths to Compute the Number of Additional Train Paths in a Railway

Session 2: Mapping Ongoing Evidence Generation Joseph F. Naimoli, OHS/USAID Nicole Bonoff,

Marawacc: A Framework for Heterogeneous Computing in Java - PowerPoint PPT Presentation

Marawacc Marawacc: A Framework for Heterogeneous Computing in Java Motivation Marawacc-API Runtime Code Generation Juan Fumero, Michel Steuwer, Christophe Dubach Runtime Management Results Conclusion The University of Edinburgh UK

Migrating to Java 9 Modules @Sander_Mak By Sander Mak Migrating to Java 9 Java 8 java -cp ..

JAVA Java vs. Java Java Language Specification

Java Comes Home to the Consumer Chet Haase Java SE Client Architect Java Comes Home to the

Multi-core in JVM/Java Concurrent programming in java Prior Java 5 Java 5 (2006)

Java Java Basics Java Program Statements Java Review Conditional statements

DTrace Topics: -&gt; java/lang/System.arraycopy &lt;- java/lang/System.arraycopy Java &lt;-

How Java works The java compiler takes a .java file and generates a .class file The .class

OpenJDK The Future of Open Source Java on GNU/Linux Dalibor Topi Java F/OSS Ambassador

Coverage in Heterogeneous Coverage in Heterogeneous Networks Xiaoli Chu King s College

The testing pyramid Maurcio F. Aniche M.F.Aniche@tudelft.nl A.java ATest.java Thats what

Upgrading Past Java 9 Sounds Scary and I dont want to pay for Java Super happy with Java 8,

Philly Java Users Group Whats new in Whats new in Java 2 Standard Edition 1.4 Java 2

A Simple Java Code Generator for ACL2 Based on a Deep Embedding of ACL2 in Java Alessandro

Java: An Operational Java: An Operational Semantics Semantics Gaurav S. S. Kc Kc Gaurav B.

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

Optimizing Enterprise Java for a Microservices Architecture Otvio Santana @otaviojava

PtrSplit: Supporting General Pointers in Automatic Program Partitioning Shen Liu Gang Tan

Interprocess Communication (IPC) The characteristics of protocols for communication between

Accelerating Large Charm++ Messages using RDMA Nitin Bhat, Vipul Harsh Nitin Bhat Masters

4 Our mission to Mars Alexander Nyen ! itemis AG ! Graphical Editing Framework Project Lead

Secure Implementations for Typed Session Abstractions Ricardo Corin, Pierre-Malo Denilou,

COM Interop &amp; P/Invoke .NET and the Old World Objectives Introduction to interoperation

Applying Geometric Thick Paths to Compute the Number of Additional Train Paths in a Railway

Session 2: Mapping Ongoing Evidence Generation Joseph F. Naimoli, OHS/USAID Nicole Bonoff,

DTrace Topics: -> java/lang/System.arraycopy <- java/lang/System.arraycopy Java <-

COM Interop & P/Invoke .NET and the Old World Objectives Introduction to interoperation