for Native Applications Gwangmu angmu Lee, e, Hyunjoon Park, - - PowerPoint PPT Presentation

for native applications
SMART_READER_LITE
LIVE PREVIEW

for Native Applications Gwangmu angmu Lee, e, Hyunjoon Park, - - PowerPoint PPT Presentation

Architecture-aware Automatic Computation Offload for Native Applications Gwangmu angmu Lee, e, Hyunjoon Park, Seonyeong Heo, Kyung-Ah Chang * , Hyogun Lee * , and Hanjun Kim. POSTECH / Samsung Electronics * 1 /23 Mo Mobi bile le de devic


slide-1
SLIDE 1

1/23

Architecture-aware Automatic Computation Offload for Native Applications

Gwangmu angmu Lee, e, Hyunjoon Park, Seonyeong Heo, Kyung-Ah Chang*, Hyogun Lee*, and Hanjun Kim.

POSTECH / Samsung Electronics*

slide-2
SLIDE 2

2/23

Mo Mobi bile le de devic ices es ar are e slo low

Chess Movement Computation Mobile

5~6x

Performance Gap Desktop

1x 2x 3x 4x 5x

void runGame () { while (!gameover) { Move mv; /* User Inputs */ mv = getPlayerTurn (); pieces[mv.tar] = mv.to; /* Heavy Computation */ mv = getAITurn (); pieces[mv.tar] = mv.to; } }

slide-3
SLIDE 3

3/23

Offl ffloadi

  • ading

ng can an bo boos

  • st

t you

  • ur mo

mobi bile le de devic ice! e!

High-performance Server Computation Result

Move the knight to A6!

Which piece to move?

Mobile Device

slide-4
SLIDE 4

4/23

(Offloaded)

Mo Most t of

  • ffloa
  • adi

ding ng sys ystems tems ar are e ba based ed on

  • n V

VMs Ms.

CloneCloud (EuroSys`11), MAUI (MobiSys`10), CMCloud (CCGrid`14) VM ARM Application VM x86 Android Linux Mobile Server (Offloaded)

slide-5
SLIDE 5

5/23

Nat ative ive Cod

  • de

e Ex Exec ecut utio ion n Ti Time me of

  • f

Top

  • p 20

20 And ndroi

  • id

d App ppli lica catio tions ns

0% 20% 40% 60% 80% 100%

AdAway Orbot Firefox VLC Player Open Camera

  • smAnd

Syncthing AFWall+ 2048 K-9 Mail PDF Reader

  • wnCloud

DAVdroid Barcode Scanner SatStat Cool Reader OS Monitor Orweb PPSSPP Adblock Plus

Web Browser Video Player Navigation E-book Reader Console Emulator PDF Reader

slide-6
SLIDE 6

6/23

Cha hall llen enge ges s in in of

  • ffl

floa

  • adin

ding g na nati tive work

  • rklo

loads ads

Android ARM ARM Application Linux x86

Mobile Server

Different Architecture Distinct Memory Different Memory Layouts Different Architecture

ARM Application

slide-7
SLIDE 7

7/23

Cha hall llen enge ges in in of

  • ffl

floa

  • adin

ding g na nati tive work

  • rklo

loads ads

Android ARM ARM Application Linux x86

Mobile Server

Stack Heap Text Stack Heap Text

Distinct Memory Different Memory Layouts Different Architecture

ARM Application

slide-8
SLIDE 8

8/23

Stack

Cha hall llen enge ges s in in of

  • ffl

floa

  • adin

ding g na nati tive work

  • rklo

loads ads

Android ARM ARM Application Linux x86

Mobile Server

ptr = 0x1234 ptr = 0x1234

Distinct Memory Different Memory Layouts Different Architecture

ARM Application int double int double

slide-9
SLIDE 9

9/23

Our ur Str trate ategy gy 1: 1: Com

  • mpi

pile e Bot

  • th

h Bina nari ries es!

Android ARM ARM Application Linux x86

Mobile Server

Different Architecture Distinct Memory Different Memory Layouts Different Architecture

ARM Application x86 Offloaded

slide-10
SLIDE 10

10/23

Our ur Str trate ategy gy 2: 2: Uni nifi fied ed Vir irtua tual l Add ddres ess

Android ARM ARM Application Linux x86

Mobile Server

x86 Offloaded

Distinct Memory Different Memory Layouts Different Architecture

Stack Heap Text Stack Heap Text

slide-11
SLIDE 11

11/23

Our ur Str trate ategy gy 3: 3: Uni nifi fied ed Me Memo mory ry La Layou

  • ut

int double ptr = 0x1234 Android ARM ARM Application Linux x86

Mobile Server

int double ptr = 0x1234 x86 Offloaded

Distinct Memory Different Memory Layouts Different Architecture

double int double int double

slide-12
SLIDE 12

12/23

Compiler

Mobile Binary Server Binary

Target Selection Virtual Address Unification

runGame:70% getAITurn:68%

Profile Sources

Partition Server Specific Opt.

Native Offloader : Structure Overview

Offload

slide-13
SLIDE 13

13/23

Sel elec ecting ting Prof

  • fitab

itable le Tar arge gets ts

Candidate Exec. Time Call Count Mem. Size runGame 27 s 1 20 MB getAITurn 26 s 3 12 MB getPlayerTurn 1.5 s 3 10 MB Candidate Ideal Gain Comm. runGam nGame 21.6 6 s 4 s s getAITu Turn rn 20.8 8 s 7.2 s getPlayerTurn 1.2 s 6 s

Estimation

Estimated Gain 17.6 6 s 13.6 6 s

  • 4.8 s

Target Selection VA Unification Partition Server Specific Opt.

slide-14
SLIDE 14

14/23

Sel elec ecting ting Prof

  • fitab

itable le Tar arge gets ts

Candidate Exec. Time Call Count Mem. Size runGame 27 s 1 20 MB getAITurn 26 s 3 12 MB getPlayerTurn 1.5 s 3 10 MB

Estimation Selecte cted d 

Candidate Ideal Gain

  • Comm. Estimated

Gain runGam nGame 21.6 6 s 4 s s 17.6 6 s getAITu Turn rn 20.8 8 s 7.2 s 13.6 6 s getPlayerTurn 1.2 s 6 s

  • 4.8 s

runGame getAITurn getPlayerTurn This calls input operations.

getAITu Turn rn 20.8 8 s 7.2 s 13.6 6 s

Target Selection VA Unification Partition Server Specific Opt. runGame getPlayerTurn

slide-15
SLIDE 15

15/23

Uni nifyi fying ng Str truc ucture ure La Layou

  • uts

ts

typedef struct { char tar, to; double score; } Move; <pad>;

tar to score tar padding to score tar to score x86 ARM Unified Target Selection VA Unification Partition Server Specific Opt.

slide-16
SLIDE 16

16/23

Uni nifyi fying ng Hea eap p Area eas

Mobile Server

Heap

board = malloc (sizeof(char) * 32); free (board); u_malloc u_free

Target Selection VA Unification Partition Server Specific Opt.

slide-17
SLIDE 17

17/23

Ali lign gnin ing g Two

  • Sta

tack ck Area eas

Mobile Stack Server Stack Mobile Server i = 30 j = 60 j = 60

int bar (int *i) { int j = 60; if (*i!=30) crash; } int foo () { int i = 30; bar (&i); }

Pointed by int *i *i==60  Crash!

if (*i!=30) crash;

Target Selection VA Unification Partition Server Specific Opt.

slide-18
SLIDE 18

18/23

void runGame () { while (!gameover) { Move mv; mv = getPlayerTurn (); pieces[mv.tar] = mv.to; mv = getAITurn (); pieces[mv.tar] = mv.to; } } void listenClient () { FunctionID offID; while (true) {

  • ffID = acceptOffload ();

receiveData (); executeFunction (offID); sendData (); } }

Par artition titionin ing g fo for The he Sep epar arat ate e Bin inar aries ies

requestOffload (getAITurn); sendData (); receiveData (); Server Source Offload Return Execute Mobile Source Target Selection VA Unification Partition Server Specific Opt.

slide-19
SLIDE 19

19/23

Ser erver ver Spe pecific ific Opt ptimization imizations

Target Selection VA Unification Partition Server Specific Opt.

Function Pointer Mapper

:: Maps the mobile’s function address to the server’s one

Remote I/O

:: The server’s request for I/O operations remotely.

Please refer to our paper for more details!

slide-20
SLIDE 20

20/23

Rem emai ainin ning g Cha hall llen enge ges s fo for Mo Mobi bile le App pps

Multi-language Support

:: Mobile apps are written in multiple languages. ex) Android Apps w/ NDK (Java + C/C++)

Multi-threaded Applications

:: Emerging mobile applications are multi-threaded.

So, this work uses SPEC benchmark suites.

slide-21
SLIDE 21

21/23

17 SPEC 2000/2006 C Benchmarks

:: LLVM Compile Error: 400.perlbench, 403.gcc Non-profitable target: 197.parser, 254.gap, 255.vortex

Galaxy S5 as the Mobile Device, Dell XPS8700 as the Server

:: Mobile (2.5GHz Quad-core Krait 400) Server (Intel 3.60GHz Quad-core i7-4790)

2 Different Network Bandwidths

:: 802.11n (Maximum 144Mbps) 802.11ac (Maximum 844Mbps)

LLVM Compiler Framework

Eval alua uation tion

slide-22
SLIDE 22

22/23

Nor

  • rmaliz

malized ed Exec ecution ution Tim ime

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

6.42x

Speed-up! Native Offloader (144 Mbps) Native Offloader (844 Mbps) Ideal Offloading Normalized Execution Time

slide-23
SLIDE 23

23/23

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Nor

  • rmalized

malized Bat atte tery ry Con

  • nsumption

umption

Native Offloader (144 Mbps) Native Offloader (844 Mbps) Ideal Offloading

82%

Saving!

Normalized Battery Consumption

slide-24
SLIDE 24

24/23

Con

  • nclusio

lusion

Native Offloader

:: Compiler/runtime cooperative offloading system for general-purpose native applications :: To minimize offloading overheads, this work unified virtual address spaces.

Fast & Battery-friendly

:: 6.42x Speed-up / 82% Battery saving for 17 SPEC 2000/2006 C Benchmarks

Image: http://www.worth1000.com/contests/27097/interspecies-teamwork-2

Woo-hoo!

The paper also includes remote I/O, function pointer mapping, pointer size / endianness translation, runtime system and comm. optimizations.

slide-25
SLIDE 25

25/23

Na Native e Of Offlo floade ader

Architecture-aware Automatic Computation Offload for Native Applications

Backup Slides

slide-26
SLIDE 26

26/23

How

  • w It

It Wor

  • rks

ks at at Run unti time me

Mobile Server

Page Table Physical Pages Stage: Local Execution Stage: Initialization

Synchronize Prefetch

slide-27
SLIDE 27

27/23

How

  • w It

It Wor

  • rks

ks at at Run unti time me

Mobile Server

Page Table Physical Pages

On-demand Copy

Stage: Offloading Execution Page Fault Dirty Pages

slide-28
SLIDE 28

28/23

How

  • w It

It Wor

  • rks

ks at at Run unti time me

Mobile Server

Page Table Physical Pages Stage: Finalization

Synchronize Write-back

Dirty Pages

slide-29
SLIDE 29

29/23

Se Severa ral l offlo load ading ng system stems s are alread ady y propose

  • sed.

d.

Pointer Analysis-based System Instr. Instr. Instr. Obj. Obj. Obj. Obj. Li et al. (CASES`01), Wang and Li (PLDI`04)

slide-30
SLIDE 30

30/23

Fun unctio tion n Poi

  • int

nter er Ma Mapp ppin ing

Mobile Server

Stack Heap fptr(“ARGS”); fptr = toServer(fptr); fptr(“ARGS”); foo foo Text void (*fptr) (); fptr = foo;

slide-31
SLIDE 31

31/23

Uni nifyi fying ng th the e Vir irtua tual l Add ddres ess s Spa pace

01 23 45 67 01 23 45 67 00 00 00 00 01 23 45 67 00 00 00 00

Pointer Size Translation Endianness Translation

int value = *ptr32; ptr64 = zext (ptr32); int value = toLittle(*ptr64);

Installing Pointer Size & Endianness Translators

slide-32
SLIDE 32

32/23

Rea eall lloc

  • cating

ating Glo loba bal l Var aria iables bles

gvar gvar

Mobile Server

int gvar; int *gptr = &gvar; int foo () { *gptr = 400; } int *gvar_re= u_malloc(4); int *gptr = gvar_re;

gvar_re gvar_re gvar gvar Heap Text

slide-33
SLIDE 33

33/23

Ser erver ver Spe pecific ific Opt ptimization imizations

Installing Remote I/O APIs Hello World! Mobile Server World!

printf (“%s”, “Hello,\n”); printf (“%s”, “World!”); r_printf (“%s”, “World!”);

Send to the mobile

slide-34
SLIDE 34

34/23

Run unti time me Bat attery tery Con

  • nsum

umpti ption

  • n Pat

attern tern

445.gobmk (Fast, 802.11ac) 445.gobmk (Slow, 802.11n)

Initial Prefetching Overhead : 802.11ac < 802.11n Total Remote I/O Overhead : 802.11ac > > 802.11n Total Battery Overhead : 802.11ac > 802.11n

+

slide-35
SLIDE 35

35/23

Run unti time me Estim timato ator

void runGame () { while (!gameover) { Move mv; mv = getPlayerTurn (); pieces[mv.tar] = mv.to; requestOffload (getAITurn); sendData (); receiveData (); pieces[mv.tar] = mv.to; } }

if (profitable (getAITurn)) { } Real-time bandwidth information

slide-36
SLIDE 36

36/23

Overhea erhead d Ana naly lysis sis

0.0 0.2 0.4 0.6 0.8 1.0

s f s f s f s f s f s f s f s f s f s f s f s f s f s f s f s f s f

Useful Work Function Pointer Translation Remote I/O Communication

Large data size & Ineffective data compression Frequent remote input The algorithm uses function pointers frequently.

slide-37
SLIDE 37

37/23

QU QUEST ESTIONS IONS & ANSWERS SWERS

Thank you for your attention!

Compiler Research Lab (CORELAB)

  • Dept. of Computer Science & Engineering

POSTECH, Korea

Gwangmu Lee

iss300@postech.ac.kr