1/23
Architecture-aware Automatic Computation Offload for Native Applications
Gwangmu angmu Lee, e, Hyunjoon Park, Seonyeong Heo, Kyung-Ah Chang*, Hyogun Lee*, and Hanjun Kim.
POSTECH / Samsung Electronics*
for Native Applications Gwangmu angmu Lee, e, Hyunjoon Park, - - PowerPoint PPT Presentation
Architecture-aware Automatic Computation Offload for Native Applications Gwangmu angmu Lee, e, Hyunjoon Park, Seonyeong Heo, Kyung-Ah Chang * , Hyogun Lee * , and Hanjun Kim. POSTECH / Samsung Electronics * 1 /23 Mo Mobi bile le de devic
1/23
POSTECH / Samsung Electronics*
2/23
Chess Movement Computation Mobile
Performance Gap Desktop
1x 2x 3x 4x 5x
3/23
High-performance Server Computation Result
Mobile Device
4/23
(Offloaded)
CloneCloud (EuroSys`11), MAUI (MobiSys`10), CMCloud (CCGrid`14) VM ARM Application VM x86 Android Linux Mobile Server (Offloaded)
5/23
0% 20% 40% 60% 80% 100%
AdAway Orbot Firefox VLC Player Open Camera
Syncthing AFWall+ 2048 K-9 Mail PDF Reader
DAVdroid Barcode Scanner SatStat Cool Reader OS Monitor Orweb PPSSPP Adblock Plus
Web Browser Video Player Navigation E-book Reader Console Emulator PDF Reader
6/23
Android ARM ARM Application Linux x86
ARM Application
7/23
Android ARM ARM Application Linux x86
Stack Heap Text Stack Heap Text
ARM Application
8/23
Stack
Android ARM ARM Application Linux x86
ptr = 0x1234 ptr = 0x1234
ARM Application int double int double
9/23
Android ARM ARM Application Linux x86
ARM Application x86 Offloaded
10/23
Android ARM ARM Application Linux x86
x86 Offloaded
Stack Heap Text Stack Heap Text
11/23
int double ptr = 0x1234 Android ARM ARM Application Linux x86
int double ptr = 0x1234 x86 Offloaded
double int double int double
12/23
Mobile Binary Server Binary
runGame:70% getAITurn:68%
Offload
13/23
Target Selection VA Unification Partition Server Specific Opt.
14/23
runGame getAITurn getPlayerTurn This calls input operations.
Target Selection VA Unification Partition Server Specific Opt. runGame getPlayerTurn
15/23
tar to score tar padding to score tar to score x86 ARM Unified Target Selection VA Unification Partition Server Specific Opt.
16/23
Mobile Server
Heap
Target Selection VA Unification Partition Server Specific Opt.
17/23
Mobile Stack Server Stack Mobile Server i = 30 j = 60 j = 60
Pointed by int *i *i==60 Crash!
Target Selection VA Unification Partition Server Specific Opt.
18/23
void runGame () { while (!gameover) { Move mv; mv = getPlayerTurn (); pieces[mv.tar] = mv.to; mv = getAITurn (); pieces[mv.tar] = mv.to; } } void listenClient () { FunctionID offID; while (true) {
receiveData (); executeFunction (offID); sendData (); } }
requestOffload (getAITurn); sendData (); receiveData (); Server Source Offload Return Execute Mobile Source Target Selection VA Unification Partition Server Specific Opt.
19/23
Target Selection VA Unification Partition Server Specific Opt.
20/23
21/23
:: LLVM Compile Error: 400.perlbench, 403.gcc Non-profitable target: 197.parser, 254.gap, 255.vortex
:: Mobile (2.5GHz Quad-core Krait 400) Server (Intel 3.60GHz Quad-core i7-4790)
:: 802.11n (Maximum 144Mbps) 802.11ac (Maximum 844Mbps)
22/23
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Speed-up! Native Offloader (144 Mbps) Native Offloader (844 Mbps) Ideal Offloading Normalized Execution Time
23/23
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Native Offloader (144 Mbps) Native Offloader (844 Mbps) Ideal Offloading
Normalized Battery Consumption
24/23
:: 6.42x Speed-up / 82% Battery saving for 17 SPEC 2000/2006 C Benchmarks
Image: http://www.worth1000.com/contests/27097/interspecies-teamwork-2
Woo-hoo!
The paper also includes remote I/O, function pointer mapping, pointer size / endianness translation, runtime system and comm. optimizations.
25/23
Architecture-aware Automatic Computation Offload for Native Applications
26/23
Page Table Physical Pages Stage: Local Execution Stage: Initialization
27/23
Page Table Physical Pages
Stage: Offloading Execution Page Fault Dirty Pages
28/23
Page Table Physical Pages Stage: Finalization
Dirty Pages
29/23
Pointer Analysis-based System Instr. Instr. Instr. Obj. Obj. Obj. Obj. Li et al. (CASES`01), Wang and Li (PLDI`04)
30/23
31/23
01 23 45 67 01 23 45 67 00 00 00 00 01 23 45 67 00 00 00 00
Pointer Size Translation Endianness Translation
Installing Pointer Size & Endianness Translators
32/23
gvar gvar
Mobile Server
gvar_re gvar_re gvar gvar Heap Text
33/23
Installing Remote I/O APIs Hello World! Mobile Server World!
Send to the mobile
34/23
445.gobmk (Fast, 802.11ac) 445.gobmk (Slow, 802.11n)
+
35/23
void runGame () { while (!gameover) { Move mv; mv = getPlayerTurn (); pieces[mv.tar] = mv.to; requestOffload (getAITurn); sendData (); receiveData (); pieces[mv.tar] = mv.to; } }
36/23
0.0 0.2 0.4 0.6 0.8 1.0
s f s f s f s f s f s f s f s f s f s f s f s f s f s f s f s f s f
Useful Work Function Pointer Translation Remote I/O Communication
Large data size & Ineffective data compression Frequent remote input The algorithm uses function pointers frequently.
37/23
Thank you for your attention!
Compiler Research Lab (CORELAB)
POSTECH, Korea
Gwangmu Lee
iss300@postech.ac.kr