WCED02, Anchorage, USA WCED02, Anchorage, USA Power Estimation of a - - PowerPoint PPT Presentation

wced 02 anchorage usa wced 02 anchorage usa power
SMART_READER_LITE
LIVE PREVIEW

WCED02, Anchorage, USA WCED02, Anchorage, USA Power Estimation of a - - PowerPoint PPT Presentation

WCED02, Anchorage, USA WCED02, Anchorage, USA Power Estimation of a C algorithm on a VLIW Power Estimation of a C algorithm on a VLIW Processor Processor Nathalie Julien, Eric Senn , Johann Laurent, Eric Martin LESTER, University of


slide-1
SLIDE 1
  • E. SENN - LESTER / UBS - WCED'02

1

Power Estimation of a C algorithm on a VLIW Power Estimation of a C algorithm on a VLIW Processor Processor

Nathalie Julien, Eric Senn, Johann Laurent, Eric Martin

LESTER, University of South Brittany, Lorient, France Eric.Senn@univ-ubs.fr http://lester.univ-ubs.fr

WCED’02, Anchorage, USA WCED’02, Anchorage, USA

slide-2
SLIDE 2
  • E. SENN - LESTER / UBS - WCED'02

2

test1(int IU, int JU, int KU) { int i, j, k; for(i=0; i<IU; i++) for(j=0; j<JU; j++) { for(k=8; k<KU; k++) A[k] = A[k-8]; B[i][j] = B[i+1][j] + A[i]; } for(i=0; i<IU; i++) for(j=0; j<JU; j++) B[i][j] = B[i][j] + B[i+1][j]; }

P ? P ?

C-level estimation WITHOUT compilation Complete power model

Context Context

slide-3
SLIDE 3
  • E. SENN - LESTER / UBS - WCED'02

3

Power Estimation Power Estimation

Gate level estimation:

  • very accurate but long simulation time
  • RTL description needed

Instruction Level P. A.

  • accurate
  • limited for VLIW processor
  • compiler dependent
  • memories and pipeline

stalls not taken into account Functional Level P. A.

  • accurate & fast
  • based on architecture analysis
  • compiler independent
  • memories and pipeline

stalls taken into account RTL description not available

slide-4
SLIDE 4
  • E. SENN - LESTER / UBS - WCED'02

4

Processor FLPA Measures Model Definition Parameters

α P = a α + b

Power Model

Consumption in mapped mode

500 1000 1500 2000 2500 3000 3500 0,25 0,5 0,75 1

Parallelism rate (%) Current (mA)

80 133 160 200

Configuration Parameters

Frequency, Memory Mode...

Algorithmic parameters

Parallelism, Processing units Cache miss, Pipeline Stalls...

P = 4 α + 1

Methodology: Model Definition Methodology: Model Definition

slide-5
SLIDE 5
  • E. SENN - LESTER / UBS - WCED'02

5

Power Model C Algorithm Parameter values C-level Power Estimation Estimation Process Processor FLPA Measures Model Definition α = 0.5

P= 3 W

Algorithmic parameters with prediction models Configuration parameters with the application Assumptions on the compiler efficiency

P = 4 α + 1

Methodology: Estimation Process Methodology: Estimation Process

slide-6
SLIDE 6
  • E. SENN - LESTER / UBS - WCED'02

6

TI C6x: Model Definition TI C6x: Model Definition

TI TMS320C6201: VLIW processor up to 8 instructions in parallel, deep pipeline (up to 11 stages), 4 memory modes: mapped, bypass, cache and freeze FLPA: Functional-Level Power Analysis

Program RAM/cache Program/data buses EMIF Program fetch Instruction dispatch Data RAM Instruction decode Side A 4 processing units register file Side B 4 processing units register file

DMA IMU PU MMU CPU α : parallelism rate β : number of processing units γ : cache miss rate PSR : pipeline stall rate F : clock frequency MM : memory mode β α γ ε

slide-7
SLIDE 7
  • E. SENN - LESTER / UBS - WCED'02

7

TI C6x: Power Model TI C6x: Power Model

Power consumption rule in mapped mode Pcore = VDD * ([aβ(1-PSR) + bm] F + α(1-PSR) [amF + cm] + dm) measurements: a=0.64, am=5.21, bm=4.19, cm=42.401, and dm=7.6 α β γ F MM

Pcore

PSR TMS320C6201 POWER MODEL

ALGORITHMIC PARAMETERS CONFIGURATION PARAMETERS

slide-8
SLIDE 8
  • E. SENN - LESTER / UBS - WCED'02

8

Parameters extraction Parameters extraction

X=a+b; Y=c+d; for (i=0;i<10;i++) y[i]=c[i]*d[i+1]; Z=a+d; for (j=0;j<50;j++) { for(k=0;k<32;k++) tab[k]=h[k-1]+l[k+1] }

Loop nests analysis

X=a+b; Y=c+d; for (i=0;i<10;i++) y[i]=c[i]*d[i+1]; Z=a+d; for (j=0;j<50;j++) { for(k=0;k<32;k++) tab[k]=h[k-1]+l[k+1] } Local parameters prediction (α,β) Local parameters prediction (α,β) Global parameters (α,β) : average of local values

slide-9
SLIDE 9
  • E. SENN - LESTER / UBS - WCED'02

9

Parameters extraction Parameters extraction

Loop body: 8 instructions = 4 LD, 4 OP NFP = 1; NPU = 8 For (i=0; i<512; i++) Y= x[i]*(h[i] + h[i+1] + h[i-1]) + y;

1 ≤ = NEP NFP α

;

1 NEP NPU 8 1 ≤ = β

PREDICTION MODEL EP1 EP2 EP3 EP4 α, β SEQ 8 EP 0.125 MAX 2 LD 2 LD 4 OP

  • 0.5

MIN 1 LD 1 LD 1 LD 1 LD 4 OP 0.25 DATA 2 LD 1 LD 1 LD 4 OP

  • 0.33

NFP: Number of Fetch Packets NPU: Number of Processing Units NEP: Number of Execution Packets

slide-10
SLIDE 10
  • E. SENN - LESTER / UBS - WCED'02

10

Program RAM/cache Data RAM Program/data buses EMIF

CPU

Program fetch Instruction dispatch Instruction decode Side A Side B

DMA

PU1 PU2 PU3 PU4 PU1 PU2 PU3 PU4 PU1 PU2 PU3 PU4 PU1 PU2 PU3 PU4

Max model fully exploitation

  • f the architecture

PU1 PU2 PU3 PU1 PU2 PU3

Min model load/ store never executed in parallel

PU1 PU2 PU3 PU1 PU2 PU3 PU4 Data model load/store executed in parallel

  • nly on different

data PU1 SEQ model instructions executed sequentially

Prediction models Prediction models

slide-11
SLIDE 11
  • E. SENN - LESTER / UBS - WCED'02

11

A lgorithm M easures Estim ation vs M easures (% )

A pplication M M IN T/EX T P (W ) SE Q M A X M IN D A TA FIR M M

M

IN T 4.5

  • 39%

+5%

  • 33%

+5% FFT M M

M

IN T 2.65

  • 11%

+12%

  • 3%
  • 2.6%

LM S M M

B

IN T 4.97 +1% +3% +2% +3% LM S M M

C

IN T 5.67

  • 55%

+5.8%

  • 16%

+5.8% D W T 64*64 M M

M

IN T 3.75

  • 25%

+13%

  • 13%
  • 5.9%

D W T 64*64 M M

M

EX T 2.55

  • 10%

+3%

  • 5.9%
  • 3.5%

D W T 512*512 M M

M

EX T 2.55

  • 11%

+2.4%

  • 7%
  • 3.9%

EFR vocoder M M

M

IN T 5.08

  • 50%

+11%

  • 24%

+1% M PEG decoder M M

M

IN T 5.82

  • 54%

+9.6%

  • 32%
  • 8%

A verage error 32% 7.8% 17% 4.8%

Results Results

  • Estimation vs Measures < 8%
  • Minimum and maximum bounds provided
slide-12
SLIDE 12
  • E. SENN - LESTER / UBS - WCED'02

12

Consumption "maps" Consumption "maps"

  • Consumption maps for the EFR Vocoder

1 2 3 4 5 6 7 10 20 30 40 50 60 70 80 90 PSR (%) POWER (W) DATA prediction MEASURE

2 4 6 8

POWER (W) 30 60 90 20 40 60 80 CACHE MISS RATE (%) PSR (%)

In mapped mode In cache mode

slide-13
SLIDE 13
  • E. SENN - LESTER / UBS - WCED'02

13

PSR estimation PSR estimation

  • PSR=NPS / NTC

– NPS: number of cycles where the pipeline is stalled – NTC: total number of cycles

  • NPS=NPSτ+NPSbc+NPSγ

– NPSτ: external data access - NEXT - Data Mapping (C- level) – NPSbc: internal data bank conflict - NCONFLICT - Data Mapping (C-level) – NPSγ : program cache misses - NFRAME - Compilation (A-level)

slide-14
SLIDE 14
  • E. SENN - LESTER / UBS - WCED'02

14

# of code lines # of lines studied Application C ASM Number %C FFT 77 408 10 13 LMS 30 408 4 13.3 DWT 64*64 46 714 17 37 EFR 118 1323 37 31.2 MPEG 2267 8488 30 1.3

Complexity reduction Complexity reduction

  • Only a portion of the code is to be studied
  • Optimization effort can be focussed
slide-15
SLIDE 15
  • E. SENN - LESTER / UBS - WCED'02

15

Conclusion Conclusion

  • Original and general approach validated on a VLIW

DSP architecture

  • Estimation of minimum and maximum bounds of an

algorithm power consumption

  • Fast and accurate power estimation at the C-level

(error max = 8%)

  • Refining at the assembly level (error max = 3%)

– but compilation is needed then

slide-16
SLIDE 16
  • E. SENN - LESTER / UBS - WCED'02

16

Conclusion Conclusion

  • Co-design HW/SW, SOC
  • High level abstraction decision

– no compilation – no physical measurements – no development tools and evaluation boards

  • Fast feedback on software performances

– hot spots – pieces of code not suitable for compilation yet

  • Complexity reduction
slide-17
SLIDE 17
  • E. SENN - LESTER / UBS - WCED'02

17

Current and Future works Current and Future works

  • Development of an automatic tool in progress

(available on-line before 2003)

  • Extension of the power model library in progress (TI

C55, ARM7)

  • Execution time estimation for energy consumption
  • Generic model for external memories