Execution Time Prediction for Energy- Efficient Hardware - - PowerPoint PPT Presentation
Execution Time Prediction for Energy- Efficient Hardware - - PowerPoint PPT Presentation
Execution Time Prediction for Energy- Efficient Hardware Accelerators Tao Chen, Alex Rucker, and G. Edward Suh Computer Systems Laboratory Cornell University Accelerators in Interactive Computing Systems Interactive systems have response
Tao Chen
Accelerators in Interactive Computing Systems
- Interactive systems have response time requirements and
- ften use hardware accelerators
- Observation: Finishing earlier than the requirement is
usually not needed
- Goal: Perform DVFS for hardware accelerators to save
energy while meeting response time requirements
2
Cornell University
Tao Chen
DVFS for Interactive Computing Systems
Cornell University
3
- Save energy by running slower (lower frequency/voltage)
- Requirement
- Correctly predict each job’s execution time
Time deadline deadline Job 0 Job 1 Time deadline deadline Job 0 Job 1 Predict and Set DVFS Level
Tao Chen
Opportunity and Challenge
- Opportunity: Most jobs finish earlier than the deadline
- Challenge: Irregular variations in job execution time
Cornell University
4
Execution time of a video decoding accelerator deadline
Tao Chen
Conventional DVFS Controllers
- History-based execution time prediction
- Example: PID controller
- Problem of history-based prediction
- Reactive — decisions lag behind changes
Cornell University
5
Tao Chen
Predictive DVFS Framework for Accelerators
Cornell University
6
- Approach: Build a predictor hardware for each accelerator
that uses job input data to predict execution time
- Design Time: Build predictor and train prediction model
- Identify features related to execution time
- Generate a hardware slice that can calculate features quickly
- Train a prediction model that maps features to execution time
- Run Time: Run predictor to inform DVFS decisions
Hardware Slice Job Input Job Features Execution Time Model Job Execution Time DVFS Model DVFS Level
Tao Chen
Features to Capture Execution Time Variation
- Source of variation: input-dependent control decisions
- Feature: State Transition Count
Cornell University
7
initial state S1 S2 S3 S4
Time S3 S1 S2 S4 S1 S2 S4 S1 S1 S2 S4 S1 S4 S1
𝑇𝑈𝐷 = [𝑡𝑢(,* , 𝑡𝑢(,, , 𝑡𝑢*,- , 𝑡𝑢,,- , 𝑡𝑢-,(]
Job 1 Job 2
2 2 2 1 1 1 1 2
Job 1 Job 2
Tao Chen
Features to Capture Execution Time Variation
- Variable state latency
Cornell University
8 FSM Counter init done
done !done
initial state S1 S2 S3 S4
Time S3 S1 S3 S4 S1 S4 S1 Job 3 init 4 3 2 1 done 2 1 done init
- Feature: Counter Average Initial Value
- Other counter features in the paper
𝐵𝐽𝑊 = [𝑗𝑤3,]
Job 3
3
Tao Chen Hardware Slice Job Input Job Features Execution Time Model Job Execution Time DVFS Model DVFS Level
Identifying and Extracting Features
- Automated flow based on RTL analysis
- Identify FSM and counter features in RTL
- Instrument RTL to extract features
- More details in the paper
Cornell University
9
Tao Chen Cornell University
10
- Need to obtain features before running the accelerator
- Create a minimal version of the accelerator
- Program slicing on accelerator RTL code
- Optimize hardware slice to run fast
Control Unit Datapath
Accelerator Logic
Hardware slice
Hardware Slicing
Time S1 S4 S1 S4 S1 FSM init 4 3 2 1 done 1 done 2 init Counter S3 S3
Tao Chen Hardware Slice Job Input Job Features Execution Time Model Job Execution Time DVFS Model DVFS Level
Execution Time Prediction Model
- Train model using convex optimization
- Reduce the number of features
- Prioritize meeting deadlines over saving energy
Cornell University
11
features execution time model coefficients
Linear model: 𝑧
5 = 𝑌𝑐
Tao Chen
Evaluation Methodology
Cornell University
12
- Vertically integrated evaluation methodology
- Circuit-level simulation: obtain voltage-frequency relationship
- Gate-level modeling: obtain area, power and energy numbers
- Register-transfer-level simulation: obtain execution time
- Benchmark accelerators
- Deadline: 16.7 ms
Name Description h264 Video decoding cjpeg Image encoding djpeg Image decoding aes Cryptography sha Cryptography md Molecular dynamics stencil Image processing
Tao Chen
Results: Energy and Deadline Misses
Cornell University
13
- 36.7% energy savings on average
- 0.4% deadline misses
Tao Chen
Results: Overheads of Slice-Based Predictor
Cornell University
14
- 5.1% area overhead
- 1.5% energy overhead
- 3.5% execution time overhead
Tao Chen
More Evaluation Results in Paper
Cornell University
15
- More detailed experimental results
- Prediction Accuracy Analysis
- Results with Predictor Overheads Removed
- Sensitivity Study on Varying Deadlines
- Platform extensions
- DVFS with Voltage Boosting
- Results for FPGA-based Accelerators
- Results for Accelerators Generated by HLS
Tao Chen
Summary
Observation: Finishing faster than the deadline is not needed Goal: DVFS for accelerators with response time requirements Solution: Prediction-based DVFS
- Execution time depends on input-dependent control decisions
- Hardware features can be used to capture control decisions
- Proposed a framework to generate predictors automatically
Results: Highly accurate DVFS for accelerators
Cornell University
16