Cost-effective Hardware Accelerator Recommendation for Edge - - PowerPoint PPT Presentation

cost effective hardware accelerator recommendation for
SMART_READER_LITE
LIVE PREVIEW

Cost-effective Hardware Accelerator Recommendation for Edge - - PowerPoint PPT Presentation

Institute for Software Integrated Systems Vanderbilt University Cost-effective Hardware Accelerator Recommendation for Edge Computing Xingyu Zhou, Robert Canady, Shunxing Bao, Aniruddha Gokhale DOC-VU Group, Dept of EECS Vanderbilt University,


slide-1
SLIDE 1

Institute for Software Integrated Systems

Vanderbilt University Xingyu Zhou, Robert Canady, Shunxing Bao, Aniruddha Gokhale DOC-VU Group, Dept of EECS Vanderbilt University, Nashville, TN 37235

Cost-effective Hardware Accelerator Recommendation for Edge Computing

slide-2
SLIDE 2

Outline

▪ Current Edge HW Acc Status

▪ Challenge for HW Acc Deployment ▪ Solution Overview ▪ Case Study ▪ Conclusion

slide-3
SLIDE 3

What are HW Accelerators?

▪ Accelerating computations ▪ For general or specific task settings

CPU (most general) GPU (better suited for stream processing) FPGA (general in thoery but difficult to use) ASIC (specific)

slide-4
SLIDE 4

Why Hardware Accelerators on Edge?

▪ Heterogeneous data sources from sensors; ▪ More compute intense processing requirements

especially from image or video;

▪ Realistic physical constraints(power,size,cost. etc)

slide-5
SLIDE 5

Challenge: which accelerator is best suited for application needs?

▪ Too many different hardware devices potential for edge

+

▪ Current selection and evaluation research either single device or

even low-level circuit design

=

▪ Need to understand applicability of these accelerator technologies

for at-scale, edge-based applications

slide-6
SLIDE 6

Metrics for HW Acceleration Evaluation

▪ Latency => Application Response ▪ Power => Electricity Cost ▪ Commercial Cost => Market Price

  • V. Sze, T.-J. Yang, Y.-H. Chen, J. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and

Survey," Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, December 2017.

slide-7
SLIDE 7

Overall Goal for HW Selection

▪ Define One HW Acceleration Strategy:

(1) HW Acceleration Task Realization on Device (2) HW Acceleration Device Placement (location,time)

▪ Minimize deployment cost under constraints

Current goal: minimize cost with design latency limit

slide-8
SLIDE 8

Cost Evaluation Workflow Part I

  • 1. Application design

choose applications that can be accelerated ResNet50 (Classification) + TinyYolo (Detection)

  • 2. Hardware configuration

go through design flows

slide-9
SLIDE 9

Cost Evaluation Workflow Part II

  • 3. Per-Device Benchmarking

record time and power consumption

  • 4. Deployment Cost Approximation

= devCost (hardware market price) + deployCost (for design topology and time cycle)

  • 5. Choose device met requirements
slide-10
SLIDE 10

Per-device Applicability Validation

Applicability Test on Relative High Dimension Data:

Object Classification tasks on a set of 500 images with a resolution of 640 ∗ 480. Vehicle Detection tasks on a road traffic video consisting of 874 frames with a resolution of 1280 ∗ 720.

slide-11
SLIDE 11

At-Scale Approximatation

Design Topology Potential Scenarios:

  • 1. unmanned shopping using object

classification

  • 2. surveillance using detection

Reliability-Driven System Deployment Goal:

  • 1. should guarantee to handle no less than

half (2 of 4) of input loads from every fog group (3 groups) with an overall confidence level of 99%

  • 2. edge node inputs denoted by a normal

distribution ( assumed identical for all nodes in this topology )

  • 3. edge node inputs with relatively high

uncertainty level with stdFreq_in = muFreq_in ( inputCV=1.0 )

slide-12
SLIDE 12

At-Scale Approximatation

Bandwidth Setting: standard IEEE802 Wifi with 135Mbps

slide-13
SLIDE 13

At-Scale Approximatation

Settings: Increasing input strength for a 24-month deployment cycle

  • 1. Why hardware accelerator necessary?

CPUs: RaspPi@edge, FX6300@cloud worst

  • 2. Power is critical for long-term

two most cost-efficient options for edge: Ultra96 (FPGA) Jetson Nano (embedded GPU)

  • 3. Device tradeoff:

FPGAs hard to use,NCS not powerful

slide-14
SLIDE 14

Summary & Limitations

Presents a simple evaluation procedure as a recommendation system to help users select an accelerator hardware device for their applications deployed across the cloud to edge spectrum Cons:

  • 1. A pure strategy of one single type of device is considered
  • 2. One single type of acceleration task is set for all devices

Plan to investigate at-scale deployment of RNN and GAN in edge scenarios;

  • 3. Assume an ideal device task scheduling and device parallelism
  • 4. Have not taken interference effects between device executions into

consideration

slide-15
SLIDE 15

Thank You! Q&A