SLIDE 1
Cost-effective Hardware Accelerator Recommendation for Edge - - PowerPoint PPT Presentation
Cost-effective Hardware Accelerator Recommendation for Edge - - PowerPoint PPT Presentation
Institute for Software Integrated Systems Vanderbilt University Cost-effective Hardware Accelerator Recommendation for Edge Computing Xingyu Zhou, Robert Canady, Shunxing Bao, Aniruddha Gokhale DOC-VU Group, Dept of EECS Vanderbilt University,
SLIDE 2
SLIDE 3
What are HW Accelerators?
▪ Accelerating computations ▪ For general or specific task settings
CPU (most general) GPU (better suited for stream processing) FPGA (general in thoery but difficult to use) ASIC (specific)
SLIDE 4
Why Hardware Accelerators on Edge?
▪ Heterogeneous data sources from sensors; ▪ More compute intense processing requirements
especially from image or video;
▪ Realistic physical constraints(power,size,cost. etc)
SLIDE 5
Challenge: which accelerator is best suited for application needs?
▪ Too many different hardware devices potential for edge
+
▪ Current selection and evaluation research either single device or
even low-level circuit design
=
▪ Need to understand applicability of these accelerator technologies
for at-scale, edge-based applications
SLIDE 6
Metrics for HW Acceleration Evaluation
▪ Latency => Application Response ▪ Power => Electricity Cost ▪ Commercial Cost => Market Price
- V. Sze, T.-J. Yang, Y.-H. Chen, J. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and
Survey," Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, December 2017.
SLIDE 7
Overall Goal for HW Selection
▪ Define One HW Acceleration Strategy:
(1) HW Acceleration Task Realization on Device (2) HW Acceleration Device Placement (location,time)
▪ Minimize deployment cost under constraints
Current goal: minimize cost with design latency limit
SLIDE 8
Cost Evaluation Workflow Part I
- 1. Application design
choose applications that can be accelerated ResNet50 (Classification) + TinyYolo (Detection)
- 2. Hardware configuration
go through design flows
SLIDE 9
Cost Evaluation Workflow Part II
- 3. Per-Device Benchmarking
record time and power consumption
- 4. Deployment Cost Approximation
= devCost (hardware market price) + deployCost (for design topology and time cycle)
- 5. Choose device met requirements
SLIDE 10
Per-device Applicability Validation
Applicability Test on Relative High Dimension Data:
Object Classification tasks on a set of 500 images with a resolution of 640 ∗ 480. Vehicle Detection tasks on a road traffic video consisting of 874 frames with a resolution of 1280 ∗ 720.
SLIDE 11
At-Scale Approximatation
Design Topology Potential Scenarios:
- 1. unmanned shopping using object
classification
- 2. surveillance using detection
Reliability-Driven System Deployment Goal:
- 1. should guarantee to handle no less than
half (2 of 4) of input loads from every fog group (3 groups) with an overall confidence level of 99%
- 2. edge node inputs denoted by a normal
distribution ( assumed identical for all nodes in this topology )
- 3. edge node inputs with relatively high
uncertainty level with stdFreq_in = muFreq_in ( inputCV=1.0 )
SLIDE 12
At-Scale Approximatation
Bandwidth Setting: standard IEEE802 Wifi with 135Mbps
SLIDE 13
At-Scale Approximatation
Settings: Increasing input strength for a 24-month deployment cycle
- 1. Why hardware accelerator necessary?
CPUs: RaspPi@edge, FX6300@cloud worst
- 2. Power is critical for long-term
two most cost-efficient options for edge: Ultra96 (FPGA) Jetson Nano (embedded GPU)
- 3. Device tradeoff:
FPGAs hard to use,NCS not powerful
SLIDE 14
Summary & Limitations
Presents a simple evaluation procedure as a recommendation system to help users select an accelerator hardware device for their applications deployed across the cloud to edge spectrum Cons:
- 1. A pure strategy of one single type of device is considered
- 2. One single type of acceleration task is set for all devices
Plan to investigate at-scale deployment of RNN and GAN in edge scenarios;
- 3. Assume an ideal device task scheduling and device parallelism
- 4. Have not taken interference effects between device executions into
consideration
SLIDE 15