Experiences with Building Domain-Specifjc Compilation Plugins in - - PowerPoint PPT Presentation

experiences with building domain specifjc compilation
SMART_READER_LITE
LIVE PREVIEW

Experiences with Building Domain-Specifjc Compilation Plugins in - - PowerPoint PPT Presentation

Experiences with Building Domain-Specifjc Compilation Plugins in Graal ManLang17 , 28 Sep 2017 Colin Barrett Christos Kotselidis Foivos S. Zakkak Nikos Foutris Mikel Lujn Except where otherwise noted, this presentation is licensed under


slide-1
SLIDE 1

Experiences with Building Domain-Specifjc Compilation Plugins in Graal

ManLang’17, 28 Sep 2017 Colin Barrett Christos Kotselidis Foivos S. Zakkak Nikos Foutris Mikel Luján

Except where otherwise noted, this presentation is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. Third party marks and brands are the property of their respective holders.

slide-2
SLIDE 2

1 Introduction 1/ 16

Is there a way to create domain-specifjc compiler optimizations without having to learn the whole compilation stack? Yes! Modular JIT compilers (e.g. Graal)

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-3
SLIDE 3

1 Introduction 1/ 16

Is there a way to create domain-specifjc compiler optimizations without having to learn the whole compilation stack? Yes! Modular JIT compilers (e.g. Graal)

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-4
SLIDE 4

1 Introduction 2/ 16

Introduction

■ Computer vision applications becoming mainstream

(e.g. autonomous vehicles, virtual reality)

■ Both on embedded and desktop environments ■ Ongoing efgort to: □ Increase accuracy □ Optimize performance Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-5
SLIDE 5

1 Introduction 3/ 16

Background

Simultaneous Localization And Mapping (SLAM) Applications Input

Stream of frames from cameras moving in an unknown environment

Output

■ 3D reconstruction of environment ■ Cameras’ location in the environment ■ Absolute positions of objects in the environment Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-6
SLIDE 6

1 Introduction 3/ 16

Background

Simultaneous Localization And Mapping (SLAM) Applications Input

Stream of frames from cameras moving in an unknown environment

Output

■ 3D reconstruction of environment ■ Cameras’ location in the environment ■ Absolute positions of objects in the environment Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-7
SLIDE 7

1 Introduction 3/ 16

Background

Simultaneous Localization And Mapping (SLAM) Applications Input

Stream of frames from cameras moving in an unknown environment

Output

■ 3D reconstruction of environment ■ Cameras’ location in the environment ■ Absolute positions of objects in the environment Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-8
SLIDE 8

2 LSD-SLAM 4/ 16

Our case

Large-Scale Direct monocular SLAM (LSD-SLAM)

■ Monocular: uses a single camera for input ■ Non feature-based, operates on image densities ■ Uses pose-graphs

Pose-graph

A graph where:

■ nodes are frames ■ directed edges contain the transformations (rotation, scaling, and translation)

and the corresponding covariance matrix from the previous frame

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-9
SLIDE 9

2 LSD-SLAM 4/ 16

Our case

Large-Scale Direct monocular SLAM (LSD-SLAM)

■ Monocular: uses a single camera for input ■ Non feature-based, operates on image densities ■ Uses pose-graphs

Pose-graph

A graph where:

■ nodes are frames ■ directed edges contain the transformations (rotation, scaling, and translation)

and the corresponding covariance matrix from the previous frame

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-10
SLIDE 10

2 LSD-SLAM 5/ 16

LSD-SLAM overview

key-frame-y key-frame-x

Map Optimization

minimize error in Sim(3) poses key-frame-x frame key-frame-z Sim(3) pose

Depth Estimation

using pose and matched pixels

Tracking

create SE(3) pose from frames Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-11
SLIDE 11

2 LSD-SLAM 6/ 16

LSD-SLAM breakdown

Tracking (40.7%) Depth Estimation (49.4%)

misc.

Map Optimisation (3.3%) Pose Arithmetic (18.4%)

includes SE(3) Logarithm Levenberg-Marquardt Update

(40%)

Gradient Interpolation

(27%)

misc. Point Transform

(13%)

Framework Point Trans. SE(3) Log. Gradient Inter. L-M Update mean (ns) mean (ns) mean (ns) mean (ns) Eigen (C++) 13.342 131.138 9.847 152.376 EJML (Java) 77.411 415.924 84.479 308.412 JEigen (JNI) 1356.498 1671.105 58.961 895.845

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-12
SLIDE 12

2 LSD-SLAM 6/ 16

LSD-SLAM breakdown

Tracking (40.7%) Depth Estimation (49.4%)

misc.

Map Optimisation (3.3%) Pose Arithmetic (18.4%)

includes SE(3) Logarithm Levenberg-Marquardt Update

(40%)

Gradient Interpolation

(27%)

misc. Point Transform

(13%)

Framework Point Trans. SE(3) Log. Gradient Inter. L-M Update mean (ns) mean (ns) mean (ns) mean (ns) Eigen (C++) 13.342 131.138 9.847 152.376 EJML (Java) 77.411 415.924 84.479 308.412 JEigen (JNI) 1356.498 1671.105 58.961 895.845

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-13
SLIDE 13

2 LSD-SLAM 6/ 16

LSD-SLAM breakdown

Tracking (40.7%) Depth Estimation (49.4%)

misc.

Map Optimisation (3.3%) Pose Arithmetic (18.4%)

includes SE(3) Logarithm Levenberg-Marquardt Update

(40%)

Gradient Interpolation

(27%)

misc. Point Transform

(13%)

Framework Point Trans. SE(3) Log. Gradient Inter. L-M Update mean (ns) mean (ns) mean (ns) mean (ns) Eigen (C++) 13.342 131.138 9.847 152.376 EJML (Java) 77.411 415.924 84.479 308.412 JEigen (JNI) 1356.498 1671.105 58.961 895.845

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-14
SLIDE 14

2 LSD-SLAM 7/ 16

Performance Characterization

JIT compiler generated code worse than the hand-tuned Eigen

■ JIT compiler fails to inline some methods in the critical path ■ Opportunities for constant folding and sub-expression elimination are missed ■ No SIMD Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-15
SLIDE 15

3 Indigo 8/ 16

Indigo: Our Approach

A small vector and matrix library

■ Up to 8 elements and 8x8 cells

Accompanied by a Graal plugin

■ Force inline methods of the library ■ Custom register allocation ■ SIMD backend

Encapsulated and immutable

■ Reduces object allocation ■ Reduces memory indirection ■ Enables constant folding ■ Enhances common sub-expression

elimination

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-16
SLIDE 16

3 Indigo 9/ 16

Why a new backend and register allocator?

  • 1. There is no publicly accessible SIMD assembler in Graal
  • 2. The JVM does not support SIMD registers
  • 3. The JVM cannot handle SIMD registers during register spillage

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-17
SLIDE 17

3 Indigo 10/ 16

Indigo: Assumptions for SIMD acceleration

■ Hardware supports 128-bit vector operations ■ Indigo’s classes/subclasses contain single-precision fmoating point numbers

suitable for vector operations in SLAM

■ Unused elements of a vector are zero ■ The elements of a vector are contiguous in memory ■ Once constructed, a vector is immutable Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-18
SLIDE 18

3 Indigo 11/ 16

Indigo Compilation Plugin Outline

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-19
SLIDE 19

4 Evaluation 12/ 16

Methodology

Comparison

  • 1. Indigo vs Apache CML

as a generic small vectors and matrices Java library

  • 2. Indigo vs Eigen

as a SLAM specifjc library

Evaluation Setup

Hardware Processor Intel Core i7 4770 3.4GHz Cores 4 Hardware threads 8 Main memory 16GB Vector Units SSE 4.2 and AVX2 Software OS Windows 8.1 C++ compiler MSVC 17.00.61030 (x64) JVM Java SE 1.8.0_72 64-Bit JVMCI VM Baseline Apache CML 3.6

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-20
SLIDE 20

4 Evaluation 13/ 16

Indigo vs Apache CML: Vector Operations

1 2 3 4 5 6 7 8 9 10

Addition Cross Product S calar Division Dot Product Hamilton Product S calar Multiplication S ubtraction

Speedup (vs Apache CML) Indigo Indigo-S IMD

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-21
SLIDE 21

4 Evaluation 14/ 16

Indigo vs Apache CML: Matrix Operations

10 20 30 40 50 60 70

Addition S calar Division S calar Multiplication Vector Multiplication Matrix Multiplication S ubtraction

Speedup (vs Apache CML) Indigo Indigo-S IMD

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-22
SLIDE 22

4 Evaluation 15/ 16

Indigo vs Eigen: SLAM kernels

0.5 1 1.5 2 2.5 3

Point Transform S E(3) Logarithm Gradient Interpolation L-M Update

Speedup (vs Eigen) Indigo (w/o Graal extensions) Indigo-S IMD

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-23
SLIDE 23

5 Conclusions 16/ 16

Conclusions

■ Domain-specifjc optimizations have signifjcant impact on the performance of

domain-specifjc applications

■ Modular JIT compilers like Graal ease such optimizations through plugins ■ Indigo demonstrates that SLAM applications written in Java can be

signifjcantly optimized using this approach

Manlang’17, 28 Sep 2017

  • F. Zakkak - foivos.zakkak@manchester.ac.uk
slide-24
SLIDE 24

Thank You!

Experiences with Building Domain-Specifjc Compilation Plugins in Graal

ManLang’17, 28 Sep 2017 Colin Barrett Christos Kotselidis Foivos S. Zakkak Nikos Foutris Mikel Luján