Automatic Generation of OpenCL Code for ARM Architectures Sergio - - PowerPoint PPT Presentation

automatic generation of opencl code for arm architectures
SMART_READER_LITE
LIVE PREVIEW

Automatic Generation of OpenCL Code for ARM Architectures Sergio - - PowerPoint PPT Presentation

Automatic Generation of OpenCL Code for ARM Architectures Sergio Afonso Alejandro Acosta Francisco Almeida safonsof@ull.es aacostad@ull.es falmeida@ull.es High Performance Computing Group: http://cap.pcg.ull.es/ Introduction Systems on


slide-1
SLIDE 1

Automatic Generation of OpenCL Code for ARM Architectures

Sergio Afonso safonsof@ull.es Alejandro Acosta aacostad@ull.es Francisco Almeida falmeida@ull.es

High Performance Computing Group: http://cap.pcg.ull.es/

slide-2
SLIDE 2

Introduction

  • Systems on Chip have experienced an increase of performance due

to the growth of the smartphone market

  • We can find heterogeneity between different devices and between

processors contained in each one of them

  • It is still difficult to write portable high performance code for

heterogeneous platforms such as these ones

  • There are tools to automatically obtain accelerated code (OpenMP,

OpenACC), but they are not designed for their use in the development of mobile applications

  • A great range of computer vision and image processing applications

in mobile devices benefit from parallel processing

Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures

slide-3
SLIDE 3

Android

Java Source Code (.java) Java ByteCode (.class) Dalvik Executable (.dex) Android Run Time (ART) Hardware Java Compiler Dex Compiler CPU CPU GPU Memory ... Native Source Code (.c) Native Libraries GCC Compiler Android application package (.APK) JNI Renderscript Source Code (.rs) llvm-rs-cc Compiler LLVM ByteCode (.bc) LIBBCC (LLVM) llvm-rs-cc Compiler

  • Java: Main

programming language

  • f Android and easiest

to program, for which an extensive range of tools is provided

  • Renderscript: A

language for high performance computing in Android, which allows data parallelism

  • Native (JNI): Useful

for reusing C/C++ code and for accessing vendor-specific native libraries

Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures

slide-4
SLIDE 4

Paralldroid

Paralldroid Code Refactoring Native / OpenCL Generators Renderscript Generator Java Source Code (.java) Java Source Code (.java) Java ByteCode (.class) Dalvik Executable (.dex) Android Run Time (ART) Hardware Java Compiler Dex Compiler CPU CPU GPU Memory ... Native Source Code (.c) Native Libraries GCC Compiler Android application package (.APK) JNI Renderscript Source Code (.rs) llvm-rs-cc Compiler LLVM ByteCode (.bc) LIBBCC (LLVM) llvm-rs-cc Compiler

  • It unifies all Android

programming models: It can generate Renderscript, OpenCL and native code

  • It defines a set of annotations

that suit more naturally a Java program than i.e. OpenMP

  • It makes the development of

parallel methods substantially easier than plain OpenCL and all the code needed in order to run it from Java

  • It is implemented as an

extension to the OpenJDK compiler

Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures

slide-5
SLIDE 5

Paralldroid

Java Code OpenJDK Java Parser Java AST Annotations detector Java AST Translator OCL Tree Translator Native AST Create Native Code Create Java Code Native Code Java Code Java AST OCL Kernel Tree Translator OpenCL AST Create OpenCL Code OpenCL Code Java AST Translator Native Tree Translator Native AST Create Native Code Create Java Code Native Code Java Code Java AST Java AST Translator RS Tree Translator RS AST Create RS Code Create Java Code RS Code Java Code Java AST

Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures

slide-6
SLIDE 6

Example

@Target(OPENCL) public class GrayScale { @Declare private float gMonoMult[] = {0.299f, 0.587f, 0.114f}; @Map(TO) private int width; @Map(TO) private int height; public GrayScale(int width, int height) { this.width = width; this.height = height; } @Parallel public void run(@Input Bitmap src, @Output Bitmap out, @Index int x, @Index int y) { int pixel = src.getPixel(x, y); int acc; acc = (int)(Color.red(pixel) * gMonoMult[0]); acc += (int)(Color.green(pixel) * gMonoMult[1]); acc += (int)(Color.blue(pixel) * gMonoMult[2]);

  • ut.setPixel(x, y, Color.argb(Color.alpha(pixel), acc, acc, acc));

} }

Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures

slide-7
SLIDE 7

Computational results

GrayScale Levels Convolve3x3 Convolve5x5 Algorithm ⟶ 5 10 15 20 Speed-up ⟶

640x480

  • Gen. OCL | SXZ
  • Gen. RS | SXZ
  • Gen. OCL | XU3
  • Gen. RS | XU3

GrayScale Levels Convolve3x3 Convolve5x5 Algorithm ⟶ 5 10 15 20 Speed-up ⟶

1920x1080

  • Gen. OCL | SXZ
  • Gen. RS | SXZ
  • Gen. OCL | XU3
  • Gen. RS | XU3

640x480 1024x768 1280x720 1366x768 1600x900 1920x1080 Image dimensions (px) ⟶ 5 10 15 20 Speed-up ⟶

GrayScale

  • Gen. OCL | SXZ
  • Gen. RS | SXZ
  • Gen. OCL | XU3
  • Gen. RS | XU3

640x480 1024x768 1280x720 1366x768 1600x900 1920x1080 Image dimensions (px) ⟶ 5 10 15 20 Speed-up ⟶

Convolve5x5

  • Gen. OCL | SXZ
  • Gen. RS | SXZ
  • Gen. OCL | XU3
  • Gen. RS | XU3

Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures

slide-8
SLIDE 8

Conclusion

  • Paralldroid eases the acceleration of Android applications by means
  • f Java annotations and AST (Abstract Syntax Tree)

transformations

  • Our annotations are familiar to developers that know OpenMP, but

they are also adapted to the object-oriented programming paradigm

  • Each Java class in an application can be implemented using a

different programming language, so that each algorithm can run using the one that provides the best results transparently

  • We are currently working on improving the performance of the

OpenCL backend

Acknowledgement: This work was supported by the Spanish Ministry of Education and Science through the TIN2011-24598 and TIN2016-78919-R projects, the CAPAP-H network, the NESUS IC1315 COST Action and the EC (ERDF)

Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures