marawacc a framework for heterogeneous computing in java
play

Marawacc: A Framework for Heterogeneous Computing in Java - PowerPoint PPT Presentation

Marawacc Marawacc: A Framework for Heterogeneous Computing in Java Motivation Marawacc-API Runtime Code Generation Juan Fumero, Michel Steuwer, Christophe Dubach Runtime Management Results Conclusion The University of Edinburgh UK


  1. Marawacc Marawacc: A Framework for Heterogeneous Computing in Java Motivation Marawacc-API Runtime Code Generation Juan Fumero, Michel Steuwer, Christophe Dubach Runtime Management Results Conclusion The University of Edinburgh UK Many-Core Developer Conference 2016 1 / 23

  2. Motivation Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion 2 / 23

  3. Motivation Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion 3 / 23

  4. Motivation Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion 4 / 23

  5. Motivation Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion 5 / 23

  6. Marawacc: our approach Marawacc Three levels of abstraction Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion 6 / 23

  7. Marawacc API Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion 7 / 23

  8. Example: Saxpy in Java Marawacc Motivation Marawacc-API Runtime Code Generation Runtime f l o a t [ ] v1 = new f l o a t [ s i z e ] ; Management 1 f l o a t [ ] v2 = new f l o a t [ s i z e ] ; 2 Results f l o a t [ ] r e s u l t = new f l o a t [ s i z e ] ; 3 Conclusion 4 5 f o r ( i n t i = 0; i < s i z e ; i ++) { r e s u l t [ i ] = alpha ∗ v1 [ i ] + v2 [ i ] ; 6 7 } 8 / 23

  9. Example: Saxpy in Java Marawacc Motivation Marawacc-API Runtime Code Generation Runtime 1 F l o at [ ] v1 = new F l o a t [ s i z e ] ; Management 2 F l o at [ ] v2 = new F l o a t [ s i z e ] ; Results 3 Conclusion 4 ArrayFunc < Tuple2 < Float , Float > , Float > f ; f = new MapFunction <> (t − > alpha ∗ t . 1 () + t . 2 () ) ; 5 6 7 F l o at [ ] r e s u l t = f . z i p ( v1 , v2 ) . apply () ; 9 / 23

  10. Runtime Code Generation Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion 10 / 23

  11. Runtime Code Generation Marawacc Workflow Motivation Marawacc-API Graal VM Java source Runtime Code 1. Type inference Map.apply(f) Generation Runtime 2. IR generation Management ... Results 10: aload_2 11: iload_3 Conclusion 12: aload_0 3. optimizations 13: get fi eld 16: aaload 18: invokeinterface#apply 23: aastore 24: iinc 27: iload_3 CFG + Data fl ow ... (Graal IR) Java bytecode 4. kernel generation void kernel ( global fl oat* input, global fl oat* output) { ...; ...; } OpenCL Kernel 11 / 23

  12. Runtime Code Generation Marawacc Motivation MapFunction < Integer , Double > (x − > x * 2.0) Marawacc-API Runtime Code Generation Runtime Param Param Management Param StartNode Results StartNode IsNull StartNode MethodCallTarget Conclusion DoubleConvert Const (2.0) GuardingPi (NullCheckException) Invoke#Integer.intValue * Unbox DoubleConvert Const (2.0) DoubleConvert Const (2.0) * Return * MethodCallTarget Box Invoke#Double.valueOf inline double lambda0 ( int p0 ) { double cast_1 = ( double ) p0 ; Return double result_2 = cast_1 * 2.0; Return return result_2 ; } 12 / 23

  13. Marawacc: Runtime Management Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion 13 / 23

  14. Where the time is spent? Marawacc Motivation Black-scholes benchmark. Marawacc-API Runtime Code Float [] = ⇒ Tuple 2 < Float , Float > [] Generation Runtime Management Results 1.0 Conclusion Unmarshaling ◮ Un/marshal data Amount of total runtime in % 0.8 CopyToCPU takes up to 90% of the time 0.6 GPU Execution ◮ Computation step CopyToGPU 0.4 should be dominant Marshaling 0.2 Java overhead 0.0 This is not acceptable. Can we do better? 14 / 23

  15. Custom Array Type: PArray Marawacc Motivation PArray<T uple2<Float,Double>> Marawacc-API 0 1 2 n-1 Runtime Code Generation T uple2 T uple2 T uple2 T uple2 Runtime fl oat fl oat fl oat ... fl oat Programmer's View Management double double double double Results Conclusion Graal-OCL VM 2 0 1 n-1 FloatBu ff er fl oat fl oat fl oat ... ... ... fl oat 2 n-1 0 1 DoubleBu ff er double double double ... ... double With this layout, un/marshal operations are not necessary 15 / 23

  16. Sapy example Marawacc Motivation Marawacc-API Runtime Code Generation Runtime 1 F l o at [ ] v1 = new F l o a t [ s i z e ] ; Management 2 Double [ ] v2 = new Double [ s i z e ] ; Results 3 Conclusion 4 ArrayFunc < Tuple2 < Float , Double > , Double > f ; f = new MapFunction <> (t − > alpha ∗ t . 1 () + t . 2 () ) ; 5 6 7 F l o at [ ] r e s u l t = f . z i p ( v1 , v2 ) . apply () ; 16 / 23

  17. Saxpy with our Custom PArrays Marawacc Motivation Marawacc-API Runtime Code Generation Runtime 1 F l o at [ ] v1 = new F l o a t [ s i z e ] ; Management 2 Double [ ] v2 = new Double [ s i z e ] ; Results PArray i n p u t= new PArray ( v1 , v2 ) ; 3 Conclusion 4 5 ArrayFunc < Tuple2 < Float , Double > , Double > f ; 6 f = new MapFunction <> (t − > alpha ∗ t . 1 () + t . 2 () ) ; 7 PArray < Double > output = f . apply ( i n p u t ) ; 17 / 23

  18. Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Results Conclusion 18 / 23

  19. OpenCL GPU Execution Marawacc AMD R9 and NVIDIA GeForce GTX Titan Motivation Marawacc-API AMD Marshalling AMD Marshalling AMD Optimized AMD Optimized Runtime Code Speedup vs. Java sequential 1000 Generation 100 Runtime Management 10 Results 1 Conclusion 0.1 small large small large small large small large small large Saxpy K−Means Black−Scholes N−Body Monte Carlo Nvidia Marshalling Nvidia Marshalling Nvidia Optimized Nvidia Optimized Speedup vs. Java sequential 1000 100 10 1 0.1 small large small large small large small large small large Saxpy K−Means Black−Scholes N−Body Monte Carlo 19 / 23

  20. Comparison with OpenCL C++ Marawacc AMD R9 and NVIDIA GeForce GTX Titan Motivation Marawacc-API Runtime Code Generation Speedup over sequential code on AMD Runtime Marawacc Aparapi OpenCL C++ 1000 Speedup over sequential code Management 500 100 Results Conclusion 10 1 Small Large Small Large Small Large Small Large Small Large Saxpy K-Means Black-Scholes N-Body MonteCarlo Speedup over sequential code on NVIDIA Marawacc Aparapi OpenCL C++ Speedup over sequential code 500 100 10 1 Small Large Small Large Small Large Small Large Small Large Saxpy K-Means Black-Scholes N-Body MonteCarlo 20 / 23

  21. .zip(Conclusions).map(Future) Marawacc Motivation Marawacc-API Present Runtime Code ◮ We have presented Marawacc framework for Generation Runtime programming GPUs from Java Management ◮ Custom array type to reduce overheads when Results transforming the data Conclusion ◮ Runtime system to run heterogeneous applications within Java Future ◮ Code generation for multiple devices ◮ Runtime scheduling (Where is the best place to run the code?) 21 / 23

  22. Thanks so much for your attention Marawacc Motivation Marawacc-API Runtime Code Generation Runtime Management Results Conclusion Juan Fumero <juan.fumero@ed.ac.uk> 22 / 23

  23. OpenCL code generated Marawacc lambda0 ( f l o a t p0 ) { 1 double Motivation double c a s t 1 = ( double ) p0 ; 2 Marawacc-API r e s u l t 2 = c a s t 1 ∗ 2 . 0 ; double 3 return r e s u l t 2 ; Runtime Code 4 Generation 5 } 6 k e r n e l void lambdaComputationKernel ( Runtime Management g l o b a l f l o a t ∗ p0 , 7 g l o b a l i n t ∗ p0 index data , 8 Results g l o b a l double ∗ p1 , 9 Conclusion g l o b a l ∗ p 1 i n d e x d a t a ) { i n t 10 i n t p0 dim 1 = 0 ; i n t p1 dim 1 = 0 ; 11 gs = g e t g l o b a l s i z e (0) ; i n t 12 i n t lo op 1 = g e t g l o b a l i d (0) ; 13 ( ; ; l oop 1 += gs ) { f o r 14 i n t p 0 l e n d i m 1 = p 0 i n d e x d a t a [ p0 dim 1 ] ; 15 bool cond 2 = l oo p 1 < p 0 l e n d i m 1 ; 16 i f ( cond 2 ) { 17 auxVar0 = p0 [ l oo p 1 ] ; f l o a t 18 double r e s = lambd0 ( auxVar0 ) ; 19 p1 [ p 1 i n d e x d a t a [ p1 dim 1 + 1] + l oo p 1 ] 20 = r e s ; 21 } e l s e { break ; } 22 } 23 24 } 23 / 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend