Co Co-Inf Inference erence wi with th De Device ice-Edge Edge - PowerPoint PPT Presentation

Edg dge Int ntelligence: elligence: On On-De Demand and De Deep p Le Lear arning ning Mod odel l Co Co-Inf Inference erence wi with th De Device ice-Edge Edge Sy Syne nerg rgy En Li, Zhi Zhou, Xu Chen Sun Yat-Sen University School of Data and Computer Science

Th The e ri rise e of of ar artif tificia icial l in intelligence elligence ◼ Deep learning is a popular technique that have been applied in many fields Object Detection Voice Recognition Image Semantic Segmentation

Wh Why y is is de deep ep le learning arning successful uccessful ◼ Deep neural network is an important reason to promote the development of deep learning

Th The e he head adache ache of of de deep ep le learning arning ◼ Deep Learning applications can not be well supported by today’s mobile devices due to the large amount of computation. AlexNet Layer Latency on Raspberry Pi AlexNet Params & Flops & Layer Output Data Size

Wh What at ab about out Cloud loud Computing? omputing? ◼ Under a cloud-centric approach, large amounts of data are uploaded to the remote cloud, resulting in high end-to-end latency and energy consumption . AlexNet Performance under Cloud Computing Paradigm different bandwidth

Ex Expl ploi oiting ting of of Ed Edge ge Computing omputing ◼ By pushing the cloud capacities from the network core to the network edges (e.g. , base stations and Wi-Fi access points) in close to devices, edge computing enables low-latency and energy-efficient performance.

Ex Exis isting ting ef effor ort t of of Ed Edge ge In Intelligence elligence Framework Highlight Deep learning model partitioning Neurosurgeon (ASPLOS 2017) between cloud and mobile device, intermediate data offloading Delivering Deep Learning to Mobile Devices Offloading video input to edge server, via Offloading (SIGCOMM VR/AR Network according to network condition 2017) Deep learning model are partitioned on DeepX (IPSN 2016) different local processers Deep learning model partitioning CoINF (arxiv 2017) between smartphones and wearables Existing effort focus on data offloading and local optimization

Sys ystem em Des Design ign Our Goal ◼ With the collaboration between edge server and mobile device, we want to tune the latency of a deep learning model inference Two Design Knobs ◼ Deep Learning Model Partition ◼ Deep Learning Model Right-sizing

Two o Des Design ign Kn Knobs obs ◼ Deep Learning Model Partition ◼ Deep Learning Model Right-sizing Deep Learning Model Partition [1] AlexNet Layer Latency on Raspberry Pi & Layer Output Data Size [1] Kang, Yiping, et al. "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge." International Conference on ASPLOS ACM, 2017:615-629.

Two o Des Design ign Kn Knobs obs ◼ Deep Learning Model Partition ◼ Deep Learning Model Right-sizing AlexNet with BranchyNet [2] Structure [2] Teerapittayanon, Surat, B. Mcdanel, and H. T. Kung. "BranchyNet: Fast inference via early exiting from deep neural networks." ICPR IEEE, 2017:2464-2469.

A A Tra radeof deoff A Tradeoff ◼ Early-exit naturally gives rise to the latency- accuracy tradeoff(i.e., early -exit harms the accuracy of the inference). AlexNet with BranchyNet [2] Structure [2] Teerapittayanon, Surat, B. Mcdanel, and H. T. Kung. "BranchyNet: Fast inference via early exiting from deep neural networks." ICPR IEEE, 2017:2464-2469.

Problem Problem De Defin inition ition ◼ For mission-critical applications that typically have a predefined latency requirement, our framework maximizes the accuracy without violating the latency requirement .

Sy Syst stem em Ov Over ervie view ◆ Offline Training Stage ◆ Online Optimization Stage ◆ Co-Inference Stage

Sy Syst stem em Ov Over ervie view ◆ Offline Training Stage ◆ Online Optimization Stage ◆ Co-Inference Stage ➢ Training regression models for layer runtime prediction ➢ Training AlexNet with BranchyNet structure

Syst Sy stem em Ov Over ervie view ◆ Offline Training Stage ◆ Online Optimization Stage ◆ Co-Inference Stage ➢ Searching for exit point and partition point

Sy Syst stem em Ov Over ervie view ◆ Offline Training Stage ◆ Online Optimization Stage ◆ Co-Inference Stage Select one exit point Find out the partition point

Ex Expe perimental rimental Setup Setup ◼ Deep Learning Model  AlexNet with five exit point (built on Chainer deep learning framework)  Dataset: Cifar-10  Trained on a server with 4 Tesla P100 GPU ◼ Local Device: Raspberry Pi 3b ◼ Edge Server: A desktop PC with a quad-core Intel processor at 3.4 GHz with 8 GB of RAM

Ex Expe periments riments Regression Model Table 1: The independent variables of regression models

Ex Expe periments riments Regression Model Table 2: Regression Models Layer Edge Server Model Mobile Device Model y = 6.03e-5 * x1 + 1.24e-4 * x2 + 1.89e- Convolution y = 6.13e-3 * x1 + 2.67e-2 * x2 - 9.909 1 Relu y = 5.6e-6 * x + 5.69e-2 y = 1.5e-5 * x + 4.88e-1 y = 1.63e-5 * x1 + 4.07e-6 * x2 + 2.11e- Pooling y = 1.33e-4 * x1 + 3.31e-5 * x2 + 1.657 1 Local Response y = 6.59e-5 * x + 7.80e-2 y = 5.19e-4 * x+ 5.89e-1 Normalization Dropout y = 5.23e-6 * x+ 4.64e-3 y = 2.34e-6 * x+ 0.0525 Fully-Connected y = 1.07e-4 * x1 - 1.83e-4 * x2 + 0.164 y = 9.18e-4 * x1 + 3.99e-3 * x2 + 1.169 Model Loading y = 1.33e-6 * x + 2.182 y = 4.49e-6 * x + 842.136

Ex Expe periments riments Result ◼ Selection under different bandwidths The higher bandwidth leads to higher accuracy

Ex Expe periments riments ◼ Inference Latency under different bandwidths Our proposed regression-based latency approach can well estimate the actual deep learning model runtime latency.

Ex Expe periments riments ◼ Selection under different latency requirements A larger latency goal gives more room for accuracy improvement

Ex Expe periments riments ◼ Comparison with other methods The inference accuracy comparison under di ff erent latency requirement

KeyTak ake-Aways ys On demand accelerating deep learning model inference through device-edge synergy Deep Learning Model Partition Deep Learning Model Right-sizing Implementation and evaluations demonstrate effectiveness of our framework

Fut utur ure e Work ork ◼ More Devices ◼ Energy Consumption

Fut utur ure e Work ork ◼ Deep Reinforcement Learning Technique Deep Reinforcement Learning for Model Partition

Thank you Contact : lien5@mail2.sysu.edu.cn zhouzhi9@mail.sysu.edu.cn chenxu35@mail.sysu.edu.cn

Co Co-Inf Inference erence wi with th De Device ice-Edge Edge - PowerPoint PPT Presentation

Edg dge Int ntelligence: elligence: On On-De Demand and De Deep p Le Lear arning ning Mod odel l Co Co-Inf Inference erence wi with th De Device ice-Edge Edge Sy Syne nerg rgy En Li, Zhi Zhou, Xu Chen Sun Yat-Sen

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge

Nquire ask anything Anis Abboud, Chris Snyder, Mario Finelli Device 1 Device 2 Device 1

Ice streams, shear margins, and glacier stick-slip motion v (m/yr) 4000 1000 300 75 20 5 a

Water, Steam, and Ice the temperatures of the ice and the water compare? The ice is colder than

Inf erence p enalis ee dans les mod` eles ` a vraisemblance non explicite par des

CNBC Matlab Mini-Course Inf and NaN 3/0 returns Inf 0/0 returns NaN David S. Touretzky

Ice Streams und Isbr 1 Ice Streams: Causes for Rapid Ice Flow Fast ice flow is controlled by

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Edge-based Segmentation Transform Hough Edge Tracking Linking Edge Detection Canny Edge

world of In Inter Ic Ice-Pump JAN 2016 Presentation of Inter Ice-Pump 1 Inter Ice-Pump ApS //

ICE Analysis Training Program Module 2: How to Establish the ICE Analysis Geographical Boundary

Ice and Stride [ a ] Common User Complaints Common User Complaints Difficult to Ice Specific

TRICKLE ICE TRICKLE ICE draft-ietf-mmusic-trickle-ice Emil Ivov, Eric Rescorla, Justin Uberti

Ice sheets with rapid basal sliding Ian Hewitt, University of Oxford Antarctic ice sheet Ice

Dipl.-Inf. Robert Manthey Dipl.-Inf. Robert Manthey 15. November 2017 1 Dipl.-Inf. Robert

Time-efficient Offloading for Machine Learning Tasks between Embedded Systems and Fog Nodes

ROLE & FUNCTION OF PWBC Collegial assistance vs. legal duties Tension between PWBC

HIC 2018| Neurosurgeon Monday, 30 th July, 2018 Director of Communications, SHC tom@shcx.io

Frame for this Workshop Making the simple complicated is commonplace; making the complicated

Deep Learning on the mobile edge Georg Eickelpasch advised by Marton Kajo Thursday 10 th October,

Generalists and Specialists Using Community Embeddings to Quantify Activity Diversity in Online

From the Operating Room to the Board Room Assoc Prof Ng Wai Hoe Medical Director, National

Near-Death Experiences: What if anything might they tell us about life after death? Walter L.

Co Co-Inf Inference erence wi with th De Device ice-Edge Edge - PowerPoint PPT Presentation

Edg dge Int ntelligence: elligence: On On-De Demand and De Deep p Le Lear arning ning Mod odel l Co Co-Inf Inference erence wi with th De Device ice-Edge Edge Sy Syne nerg rgy En Li, Zhi Zhou, Xu Chen Sun Yat-Sen

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge

Nquire ask anything Anis Abboud, Chris Snyder, Mario Finelli Device 1 Device 2 Device 1

Ice streams, shear margins, and glacier stick-slip motion v (m/yr) 4000 1000 300 75 20 5 a

Water, Steam, and Ice the temperatures of the ice and the water compare? The ice is colder than

Inf erence p enalis ee dans les mod` eles ` a vraisemblance non explicite par des

CNBC Matlab Mini-Course Inf and NaN 3/0 returns Inf 0/0 returns NaN David S. Touretzky

Ice Streams und Isbr 1 Ice Streams: Causes for Rapid Ice Flow Fast ice flow is controlled by

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Edge-based Segmentation Transform Hough Edge Tracking Linking Edge Detection Canny Edge

world of In Inter Ic Ice-Pump JAN 2016 Presentation of Inter Ice-Pump 1 Inter Ice-Pump ApS //

ICE Analysis Training Program Module 2: How to Establish the ICE Analysis Geographical Boundary

Ice and Stride [ a ] Common User Complaints Common User Complaints Difficult to Ice Specific

TRICKLE ICE TRICKLE ICE draft-ietf-mmusic-trickle-ice Emil Ivov, Eric Rescorla, Justin Uberti

Ice sheets with rapid basal sliding Ian Hewitt, University of Oxford Antarctic ice sheet Ice

Dipl.-Inf. Robert Manthey Dipl.-Inf. Robert Manthey 15. November 2017 1 Dipl.-Inf. Robert

Time-efficient Offloading for Machine Learning Tasks between Embedded Systems and Fog Nodes

ROLE &amp; FUNCTION OF PWBC Collegial assistance vs. legal duties Tension between PWBC

HIC 2018| Neurosurgeon Monday, 30 th July, 2018 Director of Communications, SHC tom@shcx.io

Frame for this Workshop Making the simple complicated is commonplace; making the complicated

Deep Learning on the mobile edge Georg Eickelpasch advised by Marton Kajo Thursday 10 th October,

Generalists and Specialists Using Community Embeddings to Quantify Activity Diversity in Online

From the Operating Room to the Board Room Assoc Prof Ng Wai Hoe Medical Director, National

Near-Death Experiences: What if anything might they tell us about life after death? Walter L.

ROLE & FUNCTION OF PWBC Collegial assistance vs. legal duties Tension between PWBC