Execution Time Prediction for Energy- Efficient Hardware - - PowerPoint PPT Presentation

execution time prediction for energy efficient hardware
SMART_READER_LITE
LIVE PREVIEW

Execution Time Prediction for Energy- Efficient Hardware - - PowerPoint PPT Presentation

Execution Time Prediction for Energy- Efficient Hardware Accelerators Tao Chen, Alex Rucker, and G. Edward Suh Computer Systems Laboratory Cornell University Accelerators in Interactive Computing Systems Interactive systems have response


slide-1
SLIDE 1

Execution Time Prediction for Energy- Efficient Hardware Accelerators

Tao Chen, Alex Rucker, and G. Edward Suh Computer Systems Laboratory Cornell University

slide-2
SLIDE 2

Tao Chen

Accelerators in Interactive Computing Systems

  • Interactive systems have response time requirements and
  • ften use hardware accelerators
  • Observation: Finishing earlier than the requirement is

usually not needed

  • Goal: Perform DVFS for hardware accelerators to save

energy while meeting response time requirements

2

Cornell University

slide-3
SLIDE 3

Tao Chen

DVFS for Interactive Computing Systems

Cornell University

3

  • Save energy by running slower (lower frequency/voltage)
  • Requirement
  • Correctly predict each job’s execution time

Time deadline deadline Job 0 Job 1 Time deadline deadline Job 0 Job 1 Predict and Set DVFS Level

slide-4
SLIDE 4

Tao Chen

Opportunity and Challenge

  • Opportunity: Most jobs finish earlier than the deadline
  • Challenge: Irregular variations in job execution time

Cornell University

4

Execution time of a video decoding accelerator deadline

slide-5
SLIDE 5

Tao Chen

Conventional DVFS Controllers

  • History-based execution time prediction
  • Example: PID controller
  • Problem of history-based prediction
  • Reactive — decisions lag behind changes

Cornell University

5

slide-6
SLIDE 6

Tao Chen

Predictive DVFS Framework for Accelerators

Cornell University

6

  • Approach: Build a predictor hardware for each accelerator

that uses job input data to predict execution time

  • Design Time: Build predictor and train prediction model
  • Identify features related to execution time
  • Generate a hardware slice that can calculate features quickly
  • Train a prediction model that maps features to execution time
  • Run Time: Run predictor to inform DVFS decisions

Hardware Slice Job Input Job Features Execution Time Model Job Execution Time DVFS Model DVFS Level

slide-7
SLIDE 7

Tao Chen

Features to Capture Execution Time Variation

  • Source of variation: input-dependent control decisions
  • Feature: State Transition Count

Cornell University

7

initial state S1 S2 S3 S4

Time S3 S1 S2 S4 S1 S2 S4 S1 S1 S2 S4 S1 S4 S1

𝑇𝑈𝐷 = [𝑡𝑢(,* , 𝑡𝑢(,, , 𝑡𝑢*,- , 𝑡𝑢,,- , 𝑡𝑢-,(]

Job 1 Job 2

2 2 2 1 1 1 1 2

Job 1 Job 2

slide-8
SLIDE 8

Tao Chen

Features to Capture Execution Time Variation

  • Variable state latency

Cornell University

8 FSM Counter init done

done !done

initial state S1 S2 S3 S4

Time S3 S1 S3 S4 S1 S4 S1 Job 3 init 4 3 2 1 done 2 1 done init

  • Feature: Counter Average Initial Value
  • Other counter features in the paper

𝐵𝐽𝑊 = [𝑗𝑤3,]

Job 3

3

slide-9
SLIDE 9

Tao Chen Hardware Slice Job Input Job Features Execution Time Model Job Execution Time DVFS Model DVFS Level

Identifying and Extracting Features

  • Automated flow based on RTL analysis
  • Identify FSM and counter features in RTL
  • Instrument RTL to extract features
  • More details in the paper

Cornell University

9

slide-10
SLIDE 10

Tao Chen Cornell University

10

  • Need to obtain features before running the accelerator
  • Create a minimal version of the accelerator
  • Program slicing on accelerator RTL code
  • Optimize hardware slice to run fast

Control Unit Datapath

Accelerator Logic

Hardware slice

Hardware Slicing

Time S1 S4 S1 S4 S1 FSM init 4 3 2 1 done 1 done 2 init Counter S3 S3

slide-11
SLIDE 11

Tao Chen Hardware Slice Job Input Job Features Execution Time Model Job Execution Time DVFS Model DVFS Level

Execution Time Prediction Model

  • Train model using convex optimization
  • Reduce the number of features
  • Prioritize meeting deadlines over saving energy

Cornell University

11

features execution time model coefficients

Linear model: 𝑧

5 = 𝑌𝑐

slide-12
SLIDE 12

Tao Chen

Evaluation Methodology

Cornell University

12

  • Vertically integrated evaluation methodology
  • Circuit-level simulation: obtain voltage-frequency relationship
  • Gate-level modeling: obtain area, power and energy numbers
  • Register-transfer-level simulation: obtain execution time
  • Benchmark accelerators
  • Deadline: 16.7 ms

Name Description h264 Video decoding cjpeg Image encoding djpeg Image decoding aes Cryptography sha Cryptography md Molecular dynamics stencil Image processing

slide-13
SLIDE 13

Tao Chen

Results: Energy and Deadline Misses

Cornell University

13

  • 36.7% energy savings on average
  • 0.4% deadline misses
slide-14
SLIDE 14

Tao Chen

Results: Overheads of Slice-Based Predictor

Cornell University

14

  • 5.1% area overhead
  • 1.5% energy overhead
  • 3.5% execution time overhead
slide-15
SLIDE 15

Tao Chen

More Evaluation Results in Paper

Cornell University

15

  • More detailed experimental results
  • Prediction Accuracy Analysis
  • Results with Predictor Overheads Removed
  • Sensitivity Study on Varying Deadlines
  • Platform extensions
  • DVFS with Voltage Boosting
  • Results for FPGA-based Accelerators
  • Results for Accelerators Generated by HLS
slide-16
SLIDE 16

Tao Chen

Summary

Observation: Finishing faster than the deadline is not needed Goal: DVFS for accelerators with response time requirements Solution: Prediction-based DVFS

  • Execution time depends on input-dependent control decisions
  • Hardware features can be used to capture control decisions
  • Proposed a framework to generate predictors automatically

Results: Highly accurate DVFS for accelerators

Cornell University

16

slide-17
SLIDE 17

Questions?

Execution Time Prediction for Energy- Efficient Hardware Accelerators

Tao Chen, Alex Rucker, and G. Edward Suh Computer Systems Laboratory Cornell University