Research on Ultra Low Power SoC for Media Processing SoC for Media Processing
August 1, 2011 g , S t hi G t Satoshi Goto Waseda University
1
y
ISLPED2011
Research on Ultra Low Power SoC for Media Processing SoC for Media - - PowerPoint PPT Presentation
Research on Ultra Low Power SoC for Media Processing SoC for Media Processing August 1, 2011 g , S t Satoshi Goto hi G t Waseda University y ISLPED2011 1 Multimedia Applications in Ambient Multimedia Applications in Ambient Society
1
ISLPED2011
TV conference TV conference
Surveillance/medical care Surveillance/medical care Automotive Automotive
anytime anytime safe safe
Scene recognition / Monitoring / Control
display
Mobile/Portable Mobile/Portable an here an here comfortable comfortable
y
eye camera) Noise and vibration
Mobile/Portable Mobile/Portable anywhere anywhere comfortable comfortable Video compression Video compression + Network Security
Home entertainment Home entertainment Security Recognition :
movie, animation, --)
forwarding, etc)
M Med Med
Digital signal processing
Media enc Media enc Error Error Recogni Recogni En En Pre/Po Pre/Po Sensor Sensor RF RF dia / Envi dia / Envi Netw Netw coding/de coding/de r collectio r collectio ition/Synt ition/Synt ncryption ncryption
r Display Display ironment ironment R
work proto work proto ecoding ecoding
thesis thesis ssing ssing y/Actuato y/Actuato tal Inform tal Inform RF RF
wire/ wire/ wireless wireless
mation mation
interpolation interpolation
/ recognition / recognition H.264 H.264 JPEG2000 JPEG2000 public key public key private key private key t LDPC LDPC turbo turbo UWB UWB i i coordinate coordinate translation translation character character recognition recognition 11a/b/g/n 11a/b/g/n ad hoc ad hoc 10G 10G baseT baseT digital filter digital filter interpolation interpolation CG CG stream stream cipher cipher image image analysis analysis non non-
standard codec codec RS RS 10G 10G baseT baseT 4G 4G
Challenge:
3
Challenge: How do we realize Ultra low-power consumption chip?
System Level Algorithm L l Register Transfer Level Gate Level Transistor Level Silicon Level Level Transfer Level Level Level
Precision Error
>50% 25-50% 15-40% 10-20% 5-10%
Power Reduction >70%
50-70% 15-50% 5-15% 3-5%
おける消費電力管理 (
Possible power reduction and observation accuracy of power consumption achieved at each abstraction level
50% 25 50% 15 40% 10 20% 5 10%
From ASIC/ICにおける消費電力管理 (by Synopsys)
Optimizations at System, Algorithm and Register- transfer levels are important for Ultra-low power SoC
4 ISLPED2011
ISLPED2011 5
p
: Capacitance
Cl k f ( V
)
: Clock frequency ∝ ( VDD- VTH )
h h ld l
6 ISLPED2011
7 ISLPED2011
ASICON2009 8
Signal Recognition Encode Cypher ECC NW Protocol Application
Low Power Algorithm
Process Recognition Decode Encription ECC NW Protocol pp Algorithm
Low Power Algorithm
Algorithm Hetero Multi-core Execution Control
Task4 Task3 Task1 Task2 Programmable HW Low Power IP Core
ASIP
Multi Processor
Implementation Platform
Low Power Design basic Technologies
Clock GT Power GT Floor Plan High Level Synth RTL
Our Goal
Signal Recognition Encode Cypher ECC NW Protocol
Our Goal 1/100 =1/5x1/10 Low Power Algorithm
Process Recognition Decode Encription ECC NW Protocol
1/5 1/5x1/10 x1/2 Low Power Algorithm 1/5
Hetero Multi-core Execution Control
Task4 Task3 Task1 Task2 Programmable HW Low Power IP Core
ASIP
Multi Processor
1/10
Low Power Design basic Technologies
1/2
Clock GT Power GT Floor Plan High Level Synth
ISLPED2011 11
Classification
– Classify media data into
“Important part” and “Non- Important part” p p p p
I ntegration
– Error-Correcting Coding & Video Processing – Encryption & Video Processing
→Reduce complexity by 25%~ 75%
12
Video Compression Video Compression(ex. H.264
Headder, quantization mtx etc DCT coeff. Motion vector Others
General Media Information General Media Information
DCT coeff. vector
Text Information Text Information (1000 Chars, 16KBits 1000 Chars, 16KBits) Image Image Information Information (Still image 10 Still image 10 pictures, pictures, 240Mbits) 240Mbits)
→Meaning leaked
I f ti l t
→Image information
leaked
h l i l t
≠Whole information leaked
≠
h l f l
≠Whole image information leaked
≠
h l f l
I mportant I nformation Non-I mportant I nformation
→Information lost →whole image lost
≠Whole information lost ≠Whole image information lost
I mportant I nformation Non I mportant I nformation
High Cipher Strength
(2000bit RSA AES ect)
Encryption
Decrease Cipher Strength
depending on the importance
(2000bit RSA, AES, ect)
High error correcting capability
(10000bit LDPC code etc)
Encoding
depending on the importance
Decrease error correcting
capability depending on the
13
(10000bit LDPC code, etc)
capability depending on the importance
Integration of Video Encoding and Error-
Correcting Coding -
Error Correcting Coding Classification of H.264 Video Data Error-Correcting Coding Important Non-important
Coding Ratio Low High #Repetitions of #Repetitions of LDPC Code Large Small LDPC Code LDPC Code Length Long Short Computing
14
Computing time Large Small
Partition A Partition B (Important) (Unimportant) foreman_qcif 21659 34657 football_qcif 22312 77581 salesman_qcif 13091 33167 container_qcif 6698 25036
15 ISLPED2011
Experiment for Unequal ECC in Video Encoder
34 35 36 33 35 37 31 32 33 29 31 33 NOUEP 28 29 30 3 3.1 3.2 3.3 3.4 3.5 25 27 2.9 3 3.1 3.2 3.3 3.4 UEP1 UEP2 UEP3(提案)
foreman container Integrated Independent Integrated Foreman Football Container
16
Power Reduction 25.5% 25.4% 56.6%
SASIMI 2009
SASIMI 2009
PSNR
Vid Video Quality
ITC-CSCC 2010 17
Computation time 3 : 1
ISLPED2011 18
►video conference systems, etc
19 ISLPED2011
Region-of-interest (ROI): the video contents should be
encoded in higher priority
Attract human’s attention Important for overall performance
ROI based video coding: Flexibly assign encoding resources
Resources: bits budget, encoding power consumption Advantages: Higher subjective quality, higher overall performance
20
Normal encoding ROI encoding
(ROI ≡ Human face)
E ti ti Estimation step w/o continuity check Final results
21
(ROI ≡ Human face)
C
Experimental results:
77% of encoding time is reduced
IEICE Trans. on Electronics, Apr. 2011.
BDPSNR : Average PSNR difference in dB over the whole range of bit-rates whole range of bit rates. BDBR : Average bit-rate difference in % over the 22 difference in % over the whole range of PSNR
23 ISLPED2011
( y motion, lighting and so on) to start encoding
– Regardless of foreground or background – Regardless of human or not
content is not changed
Area w/o difference
content is not changed
Area w/o difference
Area with difference
24 24
n n+1 frame number:
ISLPED2011
estimation estimation
comparing with the collocated MB in the previous frame using
N di t f h i t – Norm distance of chrominance components – Coarse motion information
Proposed Encoder Similarity Mode Decision MB Coded MBs
Original Encoder Early Skip Mode Decision
Detector Decision MB Video Frame
Bit Stream Writer Compressed Bit Stream Video Frame
Skip-m 25 ISLPED2011
Performance Constraint fs Core 0 Workload Estimation DVFS CPU Input Video Slice 0
s
Frequency Mapping
n
Ne fn Bit St H.264 Encoder Ns Nei-1 Difference Detection Bit Stream Writer MB Nei Nsi Core 0 Mode Decision Nei Compressed Core j
…
Bit Stream Bit Stream Core N-1 Joint
ISLPED2011
Constraint Deviation (PCD):
1 2 k fs N
f
S 1
, 1, 1
n n k T S n k fs N
D Ne Ne k f f
fS
– For surveillance video encoding applications, fS generally equals to 30 Even if the encoding energy is reduced the encoding speed >= 30 fps to
k
D
ft
S
1 k
D
maintain QoS
ISLPED2011
Renesas RP1 evaluation board with 2 SuperH SH 4A processors
– Renesas RP1 evaluation board with 2 SuperH SH-4A processors
28 ISLPED2011
System for experiment and Results
ICIP2010 & ICME2010
Real Demo is shown at ULP Exhibition ・Dynamic processor frequency assignment(DFS) Real Demo is shown at ULP Exhibition
Dynamically select the freq. from 600, 300, 150, 75MHz
P ll l ti f 4 b di idi th i t 4 t
Coding Schemes Normal Proposed Reduction
Parallel execution of 4 cores by dividing the screen in to 4 parts
Video data:Street (QCIF)
g p CPU Frequency (MHz) 600 300 50% Total Coding Time (s) 3775.2 68.4 98.2% Power Consumption (w) 2.8 2.2 23.4% Energy Consumption (KJ) 10.63 0.15 98.6%
29
(KJ)
ISLPED2011
ISLPED2011 30
Power reduction by switching off unused circuitry
►Fine Grain Power Gating (PG) Using Controlling Value
Power reduction by temporarily disabling clock signals Power reduction by temporarily disabling clock signals
►Optimum Sharing of Clock Gating (CG)Control Logic
Power reduction using Network on Chip
►Synthesis for Low Power Application‐Specific Network‐on‐Chip
Power reduction optimizing trade-off between power and performance p
► Dynamic frequency control under performance constraint
31 ISLPED2011
IEICE T F d t l D 2009
Controlling value of an input of a gate can power-off other blocks
IEICE Trans. on Fundamentals ,Dec. 2009.
Controlling value of an input of a gate can power-off other blocks 20 % power reduction can be found for ISCAS 85 benchmarks
Control input is manipulated as another input of each controlled gate Commercial synthesis&layout tools used and 10% reduction was found
Multi stage Power Gating Multi-stage Power Gating
Fine grain PG is applied inside power gating blocks:10% extra reduction
Controlling Value can Glitch 101 exists Controlling Value can cut off the power of
Glitch 101 exists
Original Circuit Pseudo Power Gating Multi-stage Power Gating Power Gating using controlling value Original Circuit
Glitch reduction
32
Optimum CG Sharing in Commercial tools
BDD b d h d l d f l l
IEICE Trans. on Fundamentals, Dec. 2010
BDD based method is applied for gate level circuits
generated with logic synthesis
LDPC decoder gains 9% reduction compared with
structural method, 38 % for serial parallel interface with our single stage optimization
Multi-stage Clock Gating Synthesis Multi stage Clock Gating Synthesis
Gated clock is applied to other CG logic About 9% reduction w.r.t. single stage method for
binary counters and interface circuits binary counters and interface circuits
Multi-stage CG signal Extraction
33
Greatly improve the success rate, wirelength, runtime: respectively compared with a floorplanner from Michigan respectively compared with a floorplanner from Michigan Univ., and a floorplanner from NTU
Wirelength: 12% and 7% reduction Runtime: 2.7 to 9.1X speedup Wirelength Time
Number of blocks
Aspect ratio
34 34
Aspect ratio ISLPED2011
IEICE Trans. on Fundamentals, April. 2009
Compared with Flat
Compared with Flat framwork framwork
9 4% reduction of
9 4% reduction of
9.4% reduction of
9.4% reduction of wirelength wirelength
15.6% reduction of time
15.6% reduction of time
Circuit Block Floorplan(multilevel) Floorplan (Flat) wire time(sec) wire improve time(sec) improve n100 100 207,989 3.2 203,624
2.9
n200 200 371,345 10.4 365,577
8.2
n300 300 521,365 18.4 491,517
16.0
Prmy1 752 94,264 59.5 83,974
45.5
CB37 1810 4 788 047 115 6 3 770 716 21 2% 97 2 15 9%
35
CB37 1810 4,788,047 115.6 3,770,716
97.2
Prmy2 2907 473,282 202.5 403,707
182.4
Average
35
Compared with results obtained by commercial tools only, area:25%, wire: 10%, wire-delay 8%.
Circuit: Nets: 49497 Cell : 44531 Partitioning
300 blocks + 24 macro blocks
Floorplan: Determine the position of macro blocks
Synopsys Design Flow
b
R lt f DC Results of DC ・TSMC 0.18u CMOS ・418Mbps@200MHz ・Memory: 24 memory block (Area:1 695 501)
Design Design (without FP) (without FP) Proposed Proposed Design Design 比率 比率 Area Area 16,319,256 16,319,256 11,923,480 11,923,480
25%
2011-8-4 36 Yoshimura Lab
36
・Total Area: 8,012,999 ・Power: 712,38mW
De Delay ay 6.208 6.208 5.713 5.713
8% Wire Length Wire Length 18,651,412 18,651,412 16,842,454 16,842,454
10%
ISLPED2011 37
38
1080P H.264 Encoder LSI (ISSCC2007, IEEE・JSC2009)
2.4Gps AES Encryption(ICSEC2008)
Tamper-resistant AES:
Tamper resistant AES:
3rd place in ISPLED2010 Design contest
189mw@820Mb/s OFDM/UWB Baseband(A-SSCC2009)
39
189mw@820Mb/s OFDM/UWB Baseband(A SSCC2009)
DDR PHY Data 32b DL L
1080P H.264/MPEG/AVS decoder LSI (VLSI Symp2009)
DDR CMD
A 530Mpixels/s 4096x2160@60fps H.264/AVC High Profile decoder LSI (VLSI Symp2010)
R PHY D P
High Profile decoder LSI (VLSI Symp2010)
3rd place in ISPLED2010 Design contest Best Student paper Award at VLSI Symp.2011
PLL
p p y p
40
Ultra Low Power QC-LDPC Decoder with High Parallelism(IEEE SOCC2011)
ISLPED2011
– >8 GB/ s average BW for 4Kx2k@ 60fps – Can’ t be handled even with the fastest DDR2 DRAM
– DRAM pins
Decoder Core
p – Wafer/package cost
Core
DRAM Bandwidth
DRAM power times as core power
DRAM Chips
– DRAM power times as core power
42
– Frame Rd. ↓, Line Buf. Wr./Rd. ↓
– Frame Rd. ↓, Frame Wr. ↓ 16% 6%
Typical BW Portion w/ Caching
46% 32%
Frame Rd. (w/Caching) Frame Wr.
46%
Line Buf. Wr./Rd. Rest
43 ISLPED2011
1317mW 46.7 68.7 23.3 VLSI'09 [1] 147.4
1317mW DDR2/667 46 7 54 1 PMBR
22%
1090mW DDR2/533 46.7 54.1 5.8 115.3
Rest Frame Wr
DDR2/533 23.1 30.9 PMBR + VCR-LFRC 71 7
Frame Wr. Frame Rd. Line Buf. W/R L th W/R
552mW LPDDR/333 3.2 50 100 150 200 71.7 M Words (128-bit)
Length W/R
LPDDR/333
44 ISLPED2011
Technology SMIC 90nm G CMOS Voltage 1.0V core, 1.8V/2.5V IO
DDR PHY Data 32b D D
g Die size 4x4mm2 Package 176-pin LQFP
DDR CMD
H.264/AVC Video Decoder
Gates/SRAM 662K/59.6KB
64b LPDDR
P D
Video Decoder Core
Core power 189mW@175MHz (4096x2160@60fps)
DDR PHY Data 32b
Chi i h
176mW@166MHz (3840x2160@60fps) 48mW@36MHz
Chip micrograph
48mW@36MHz (1920x1080@60fps)
45
3rd place in I SPLED2010 Design contest Best Student paper Award at VLSI Symp.2011
DRAM Video Decoder Chip JTAG UART DRAM UART FPGA
46
VLSI Symp.2010
This Work VLSI’09 [1] JSSC’07 [2] Video format(s) H 264 HP H.264 HP, H 264 MP Video format(s) H.264 HP MPEG-1/2, AVS H.264 MP
4096x2160@60 1920x1080@60 1920x1080@30 Gates/SRAM 662K/59 6KB 367K/11 0KB 160K/4 5KB Gates/SRAM 662K/59.6KB 367K/11.0KB 160K/4.5KB DRAM config. 64b DDR 32b DDR 32b DDR + 32b SDR Technology 90nm/1.0V 0.13μm/1.2V 0.18μm/1.8V Core power 189mW 257mW 320mW Scaled & norm. core power 0.36pJ/pixel 1.0pJ/pixel* 0.79pJ/pixel** N DRAM
power 1.11pJ/pixel 2.65pJ/pixel
** Power90 = Power180/(V180/V90)2/(C180/C90) = Power180/6.48
47
Power90 Power180/(V180/V90) /(C180/C90) Power180/6.48
1920x1080 30 flame/S
TSMC 0.18um CMOS 1P6M 5 44mm×4 98mm (= 27 1 mm2)
64Mb System-in-Silicon DRAM
5.44mm×4.98mm ( 27.1 mm2) Clock: 200MHz Power:1409mw(DRAM is included) Logic Gates:1140K gates Logic Gates:1140K gates SRAM:108KB
48
ISLPED2011
3.5 2 5 3.0 2.0 2.5
Memory Memory
½ Power reduction ½ Power reduction
1.5
Processor Processor
0.5 1.0
ASIC ASIC ASIC ASIC SoC SoC NTU ASIC NTU ASIC (720P) (720P) NTU (1080P) NTU (1080P) Our design SoC (1080P) (1080P)
49
(720P) (720P) (1080P) (1080P)
ISLPED2011
50
51
Layout of proposed decoder
IEEE SOCC2011
This work VLSIC2010[19] JSSC2008[21] VLSIC2007[22] Technology 65nm 0.13µm 90nm 0.13µm Supply voltage 1.2V 1.2V 1.0V 1.2V C d l th 576 2304 Code length 576~2304 Code rate 1/2, 2/3A, 2/3B, 3/4A, 3/4B, 5/6 1/2 Cycle# /iteration 24~48 48~54 ~160 ~350 Logic gate count 597k 470k 380k 420k Memory bits 56 448 72 522 89 856 76 800 Memory bits 56,448 72,522 89,856 76,800
968k 946k 970k 924k Frequency 110MHz 214MHz 150MHz 83.3MHz Iteration number 10 10 20 2~8 Throughput(Mbps) 1056 955 105 111 Power(mW) 115 397 264 52 Normalized power 230 397 484 52
(pJ/bit/iteration) 21.8 42 230 216
52
ISLPED2011 53
This project is supported by CREST to transfer the developed This project is supported by CREST to transfer the developed technology to real-world product as quickly as possible.
ISLPED2011
Human detection algorithm on a low power platform SoC with programmable hardware platform SoC with programmable hardware.
Low Power Platform SoC Task based runtime system Human Detection Algorithm Task based runtime system (Waseda University) (Renesas) ・Extract Parallelism Running at Lower Clock Speed ・Running at Lower Clock Speed Less than 1/40 energy ti f d kt PC
55
consumption of desktop PC
feature developed by Waseda university.
Input Image Result
Human
Based on the HOT feature
Human Detection Core Algorithm Non- H
56
Human Set of Detection Windows
Th l ith i d d i t l t k t t k h
for the task graph.
Human Detection Core Algorithm Human Detection Core Algorithm
Down Scale Mean Shift Gray Scale Gradient Status Table Hist. SVM
Decompose into primitive tasks
Down Scale Mean Shift Gray Scale Gradient Status Table Hist. SVM
Decompose into primitive tasks
Down Scale Gradient Status Table Hist. SVM Gray Scale Mean Shift Down Scale Gradient Status Table Hist. SVM Gradient Status Table Hist. SVM 1 1 2 2 1 2 1 2 1 2Down Scale Gradient Status Table Hist. SVM Gray Scale Mean Shift Down Scale Gradient Status Table Hist. SVM Gradient Status Table Hist. SVM
1 1 2 2 1 2 1 2 1 2
Down Scale Gradient Status Table Hist. SVM Gray Scale Mean Shift Down Scale Gradient Status Table Hist. SVM Gradient Status Table Hist. SVM
1 1 2 2 1 2 1 2 1 2
Task Graph Manager
Down Scale Gradient Status Table Hist. SVM N N=26・・・・・・
N N NRuntime System Task Graph
Down Scale Gradient Status Table Hist. SVM
N N=26
・・・・・・
N N N
Down Scale Gradient Status Table Hist. SVM
N N=26
・・・・・・
N N N
Multi Task Scheduler
CPU STP
System XBridge
Processed by CPU and/or STP controlled with the runtime system
STP: Stream Transpose Processor
57 ISLPED2011
STP: Stream Transpose Processor
(Chipset, MC, Memory, etc) are measured independently.
Measurement interval = Processing time for 1 image
3600
C t Measurement interval = Processing time for 1 image
3600
C t Measurement interval = Processing time for 1 image
3600
C t
3600 3600
C t
Desktop PC (Core2Duo@3GHz)
= Processing time for 1 image = 551ms 12V: CPU 3.3V: ChipSet, Misc 5V: ChipSet, Memory, Misc
3200 2800 2400 2000 1600
[mA] Current = Processing time for 1 image = 551ms 12V: CPU 3.3V: ChipSet, Misc 5V: ChipSet, Memory, Misc
3200 2800 2400 2000 1600
[mA] Current = Processing time for 1 image = 551ms 12V: CPU 3.3V: ChipSet, Misc 5V: ChipSet, Memory, Misc
3200 2800 2400 2000 1600
[mA] Current 12V: CPU 3.3V: ChipSet, Misc 5V: ChipSet, Memory, Misc
3200 2800 2400 2000 1600 3200 2800 2400 2000 1600
[mA] Current
p ( @ )
Measured Current
CPU: 12V
Desktop PC Mother Board
CPU: 12V
Desktop PC Mother Board
CPU: 12V
Desktop PC Mother Board
12V: ChipSet, Misc
1600 1200 800 400
12V: ChipSet, Misc
1600 1200 800 400
12V: ChipSet, Misc
1600 1200 800 400
12V: ChipSet, Misc
1600 1200 800 400 1600 1200 800 400
12V Chipset, Misc 5V Chipset, Memory, Misc 3.3V Chipset, Misc Exclude GPU 12V Chipset, Misc 5V Chipset, Memory, Misc 3.3V Chipset, Misc Exclude GPU 12V Chipset, Misc 5V Chipset, Memory, Misc 3.3V Chipset, Misc Exclude GPU
Measurement interval = Processing time for 1 image = 486ms 1.0V: STP
900 800 700 600
[mA] Current Measurement interval = Processing time for 1 image = 486ms 1.0V: STP
900 800 700 600
[mA] Current Measurement interval = Processing time for 1 image = 486ms 1.0V: STP
900 800 700 600
[mA] Current Measurement interval = Processing time for 1 image = 486ms 1.0V: STP
900 800 700 600 900 800 700 600
[mA] Current
XBridge (MIPS 4KEc@200MHz, STP@44.4MHz)
Measured Current
CPU: 1.0V
XBrigde Evaluation Board
CPU: 1.0V
XBrigde Evaluation Board
1.0V: CPU 1 8V: XBridge 1.8V: Memory
500 400 300 200 100
1.0V: CPU 1 8V: XBridge 1.8V: Memory
500 400 300 200 100
1.0V: CPU 1 8V: XBridge 1.8V: Memory
500 400 300 200 100
1.0V: CPU 1 8V: XBridge 1.8V: Memory
500 400 300 200 100 500 400 300 200 100
(CPU, System bus, etc) XBridge: 1.8V (Except CPU and STP) Memory: 1 8V
XBridge
STP: 1.0V (STP)
XBrigde Evaluation Board
(CPU, System bus, etc) XBridge: 1.8V (Except CPU and STP) Memory: 1 8V
XBridge
STP: 1.0V (STP)
XBrigde Evaluation Board
58
1.8V: XBridge (Except CPU and STP)
100
1.8V: XBridge (Except CPU and STP)
100
1.8V: XBridge (Except CPU and STP)
100
1.8V: XBridge (Except CPU and STP)
100 100
Memory: 1.8V Memory: 1.8V
ISLPED2011
Real Demo is shown at ULP Exhibition
E ti [O ll] 2 4% (1/42) – Energy consumption: [Overall] 2.4% (1/42), – [Processor part] 3.5% (1/28) Processing speed: 118%
30.000 Processor Part System Part
100%
g p
Desktop PC Core2Duo @3GHz MIPS 4KEc @266MHz 0.582 17.311 100 0% 3 4% XBridge Processing Time / Image [Sec] Relative Performance Platform Processor 4KEc@200MHz STP@44.4MHz 0.494 117 9%
20.000 25.000 ption[J]
100%
100.0% 3.4% CPU 13.036 10.968 STP
13.036 10.968 14.925 5.746 27.961 16.714 100 0% 84 1% 3 5% (1/28) 0.251 0.208 Relative Performance Processor Part Relative Energy Energy Consumption / Image [J] Processor Part System Part Total 0.663 117.9% 0.460 0.203
10.000 15.000 Energy Consum
100%
100.0% 84.1% 3.5% (1/28) 100.0% 59.8% 2.4% (1/42) Processor Part Total Relative Energy Consumption
0.000 5.000 Desktop PC (Core2Duo@3GHz) XBridge (4KEc@200MHz STP@44 4MH )
2.4% 3.5%
59
Energy Consumption Comparison between desktop PC and XBridge
STP@44.4MHz)
ISLPED2011
ISLPED2011 60
Signal Recognition Encode Cypher ECC NW Protocol
Low Power Algorithm
Process Recognition Decode Encription ECC NW Protocol
Low Power Algorithm
Hetero Multi-core Execution Control
Task4 Task3 Task1 Task2 Programmable HW Low Power IP Core
ASIP
Multi Processor
Low Power Design basic Technologies
Clock GT Power GT Floor Plan High Level Synth
1/3(2mw) 1/5(40mw) 1/10(50mw) 1/5(100mw) 1/5(100mw) 1/40(0.6J)
Goal
Signal Recognition Encode Cypher ECC NW Protocol
Achieved Low Power Algorithm
Process Recognition Decode Encription ECC NW Protocol
1/5 1/3~1/5 Low Power Algorithm 1/10 1/3~1/10
Hetero Multi-core Execution Control
Task4 Task3 Task1 Task2
1/3 1/10
Programmable HW Low Power IP Core
ASIP
Multi Processor
1/2 2/3~1/2
Low Power Design basic Technologies
1/100
Clock GT Power GT Floor Plan High Level Synth
1/3(2mw) 1/5(40mw) 1/10(50mw) 1/5(100mw) 1/5(100mw) 1/40(0.6J)
2/27~ 1/100
63