experiences with building domain specifjc compilation
play

Experiences with Building Domain-Specifjc Compilation Plugins in - PowerPoint PPT Presentation

Experiences with Building Domain-Specifjc Compilation Plugins in Graal ManLang17 , 28 Sep 2017 Colin Barrett Christos Kotselidis Foivos S. Zakkak Nikos Foutris Mikel Lujn Except where otherwise noted, this presentation is licensed under


  1. Experiences with Building Domain-Specifjc Compilation Plugins in Graal ManLang’17 , 28 Sep 2017 Colin Barrett Christos Kotselidis Foivos S. Zakkak Nikos Foutris Mikel Luján Except where otherwise noted, this presentation is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. Third party marks and brands are the property of their respective holders.

  2. 1 Introduction 1 / 16 Is there a way to create domain-specifjc compiler optimizations without having to learn the whole compilation stack? Yes! Modular JIT compilers (e.g. Graal) Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk

  3. 1 Introduction 1 / 16 Is there a way to create domain-specifjc compiler optimizations without having to learn the whole compilation stack? Yes! Modular JIT compilers (e.g. Graal) Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk

  4. 1 Introduction 2 / 16 Introduction (e.g. autonomous vehicles, virtual reality) Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ Computer vision applications becoming mainstream ■ Both on embedded and desktop environments ■ Ongoing efgort to: □ Increase accuracy □ Optimize performance

  5. 1 Introduction 3 / 16 Background S imultaneous L ocalization A nd M apping (SLAM) Applications Input Stream of frames from cameras moving in an unknown environment Output Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ 3D reconstruction of environment ■ Cameras’ location in the environment ■ Absolute positions of objects in the environment

  6. 1 Introduction 3 / 16 Background S imultaneous L ocalization A nd M apping (SLAM) Applications Input Stream of frames from cameras moving in an unknown environment Output Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ 3D reconstruction of environment ■ Cameras’ location in the environment ■ Absolute positions of objects in the environment

  7. 1 Introduction 3 / 16 Background S imultaneous L ocalization A nd M apping (SLAM) Applications Input Stream of frames from cameras moving in an unknown environment Output Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ 3D reconstruction of environment ■ Cameras’ location in the environment ■ Absolute positions of objects in the environment

  8. 2 LSD-SLAM 4 / 16 Our case L arge- S cale D irect monocular SLAM (LSD-SLAM) Pose-graph A graph where: and the corresponding covariance matrix from the previous frame Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ Monocular: uses a single camera for input ■ Non feature-based, operates on image densities ■ Uses pose-graphs ■ nodes are frames ■ directed edges contain the transformations (rotation, scaling, and translation)

  9. 2 LSD-SLAM 4 / 16 Our case L arge- S cale D irect monocular SLAM (LSD-SLAM) Pose-graph A graph where: and the corresponding covariance matrix from the previous frame Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ Monocular: uses a single camera for input ■ Non feature-based, operates on image densities ■ Uses pose-graphs ■ nodes are frames ■ directed edges contain the transformations (rotation, scaling, and translation)

  10. 2 LSD-SLAM 5 / Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk 16 LSD-SLAM overview Depth Estimation using pose and matched pixels frame Sim(3) pose key-frame-z Tracking key-frame-x create SE(3) pose from frames key-frame-y key-frame-x Map Optimization minimize error in Sim(3) poses

  11. 2 LSD-SLAM 77.411 mean (ns) mean (ns) Eigen (C++) 13.342 131.138 9.847 152.376 EJML (Java) 415.924 mean (ns) 84.479 308.412 JEigen (JNI) 1356.498 1671.105 58.961 895.845 Manlang’17, 28 Sep 2017 mean (ns) L-M Update 6 / Gradient Inter. 16 LSD-SLAM breakdown F. Zakkak - foivos.zakkak@manchester.ac.uk Point Trans. SE(3) Log. Framework Map Optimisation (3.3%) Tracking (40.7%) Depth Estimation (49.4%) misc. Pose Arithmetic (18.4%) includes SE(3) Logarithm Point Levenberg-Marquardt Update Gradient Interpolation Transform misc. (40%) (27%) (13%)

  12. 2 LSD-SLAM 77.411 mean (ns) mean (ns) Eigen (C++) 13.342 131.138 9.847 152.376 EJML (Java) 415.924 mean (ns) 84.479 308.412 JEigen (JNI) 1356.498 1671.105 58.961 895.845 Manlang’17, 28 Sep 2017 mean (ns) L-M Update 6 / Gradient Inter. 16 LSD-SLAM breakdown F. Zakkak - foivos.zakkak@manchester.ac.uk Point Trans. SE(3) Log. Framework Map Optimisation (3.3%) Tracking (40.7%) Depth Estimation (49.4%) misc. Pose Arithmetic (18.4%) includes SE(3) Logarithm Point Levenberg-Marquardt Update Gradient Interpolation Transform misc. (40%) (27%) (13%)

  13. 2 LSD-SLAM 77.411 mean (ns) mean (ns) Eigen (C++) 13.342 131.138 9.847 152.376 EJML (Java) 415.924 mean (ns) 84.479 308.412 JEigen (JNI) 1356.498 1671.105 58.961 895.845 Manlang’17, 28 Sep 2017 mean (ns) L-M Update 6 / Gradient Inter. 16 LSD-SLAM breakdown F. Zakkak - foivos.zakkak@manchester.ac.uk Point Trans. SE(3) Log. Framework Map Optimisation (3.3%) Tracking (40.7%) Depth Estimation (49.4%) misc. Pose Arithmetic (18.4%) includes SE(3) Logarithm Point Levenberg-Marquardt Update Gradient Interpolation Transform misc. (40%) (27%) (13%)

  14. 2 LSD-SLAM 7 / 16 Performance Characterization JIT compiler generated code worse than the hand-tuned Eigen Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ JIT compiler fails to inline some methods in the critical path ■ Opportunities for constant folding and sub-expression elimination are missed ■ No SIMD

  15. 3 Indigo 8 / Manlang’17, 28 Sep 2017 elimination Encapsulated and immutable F. Zakkak - foivos.zakkak@manchester.ac.uk Accompanied by a Graal plugin A small vector and matrix library Indigo: Our Approach 16 ■ Reduces object allocation ■ Up to 8 elements and 8x8 cells ■ Reduces memory indirection ■ Enables constant folding ■ Enhances common sub-expression ■ Force inline methods of the library ■ Custom register allocation ■ SIMD backend

  16. 3 Indigo 9 / 16 Why a new backend and register allocator? 2. The JVM does not support SIMD registers 3. The JVM cannot handle SIMD registers during register spillage Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk 1. There is no publicly accessible SIMD assembler in Graal

  17. 3 Indigo 10 / 16 Indigo: Assumptions for SIMD acceleration suitable for vector operations in SLAM Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ Hardware supports 128-bit vector operations ■ Indigo’s classes/subclasses contain single-precision fmoating point numbers ■ Unused elements of a vector are zero ■ The elements of a vector are contiguous in memory ■ Once constructed, a vector is immutable

  18. 3 Indigo 11 / 16 Indigo Compilation Plugin Outline Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk

  19. 4 Evaluation 8 Manlang’17, 28 Sep 2017 Apache CML 3.6 Baseline Java SE 1.8.0_72 64-Bit JVMCI VM JVM MSVC 17.00.61030 (x64) C++ compiler Windows 8.1 OS Software SSE 4.2 and AVX2 Vector Units 16GB Main memory Hardware threads 12 / 4 Cores Intel Core i7 4770 3.4GHz Processor Hardware Evaluation Setup as a SLAM specifjc library 2. Indigo vs Eigen matrices Java library as a generic small vectors and 1. Indigo vs Apache CML Comparison Methodology 16 F. Zakkak - foivos.zakkak@manchester.ac.uk

  20. 4 Evaluation 13 / Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk Indigo vs Apache CML: Vector Operations 16 Indigo Indigo-S IMD 10 9 Speedup (vs Apache CML) 8 7 6 5 4 3 2 1 0 Addition Cross S calar Dot Hamilton S calar S ubtraction Product Division Product Product Multiplication

  21. 4 Evaluation 14 / Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk 16 Indigo vs Apache CML: Matrix Operations Indigo Indigo-S IMD 70 Speedup (vs Apache CML) 60 50 40 30 20 10 0 Addition S calar S calar Vector Matrix S ubtraction Division Multiplication Multiplication Multiplication

  22. 4 Evaluation 15 / Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk 16 Indigo vs Eigen: SLAM kernels Indigo (w/o Graal extensions) Indigo-S IMD 3 2.5 Speedup (vs Eigen) 2 1.5 1 0.5 0 Point Transform S E(3) Logarithm Gradient Interpolation L-M Update

  23. 5 Conclusions 16 / 16 Conclusions domain-specifjc applications signifjcantly optimized using this approach Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ Domain-specifjc optimizations have signifjcant impact on the performance of ■ Modular JIT compilers like Graal ease such optimizations through plugins ■ Indigo demonstrates that SLAM applications written in Java can be

  24. Experiences with Building Domain-Specifjc Compilation Plugins in Graal ManLang’17 , 28 Sep 2017 Colin Barrett Christos Kotselidis Foivos S. Zakkak Nikos Foutris Mikel Luján Thank You!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend