designing for low power
play

Designing for Low Power 1 2 c c * 2 1 Architecture & - PowerPoint PPT Presentation

Advanced Digital IC-Design Content: Reduce Power at all Levels All abstraction levels are important Large savings on System & Alamouti Algorithm Information c c * system level [ u u ] 1 2 Source


  1. Advanced Digital IC-Design Content: Reduce Power at all Levels All abstraction levels are important Large savings on System & Alamouti Algorithm Information ⎡ c − c * ⎤ system level [ u u → ⎢ ] 1 2 Source ⎥ Designing for Low Power 1 2 ⎣ c c * ⎦ 2 1 Architecture & Aritmetic ADD SUB Large savings g g Logic on technology level Circuit V DD Device Change in Digital Research Why is Power Important? Speed Traditional focus on IC-Design Delay reduction for High Speed D l d ti f Hi h S d Area Reduction Area Recent Goal omplexity Add more functions Low power to reduce heat Low power to reduce heat Co Heat sink and fan is expensive New Design Impact on product lifetime, mean time between Mobile Computing y Space t i Power failure l i b Low power to increase the time i x e l between charging F Chip failure rate doubles for every 10-20 ° C increase in temperature 1

  2. Power Consumption Power Dissipation MIPS/mW Two measures are important 100 Peak power (Sets wire dimensions) 10 = × P V i 4 orders of peak DD DD max 1 magnitude Average power (Battery and cooling) 0 1 0.1 T T V ∫ = DD P i (t) dt av DD T 0.01 0 Strong Pentium TI-DSP ASIC ARM Source: R. Brodersen, Berkeley Dynamic Power Consumption Static Power Consumption I leakage increases with Energy charged in a capacitor V DD decreasing V T g E C = CV 2 /2 = C L V DD / DD / 2 /2 C L T V DD Charge P stat = I leakage × V DD Energy E c is also discharged, i.e. E tot = C L V DD 2 Open 0 Sub-threshold in this Power consumption case case P = f C L V DD P f C V 2 2 Discharge Leakage 2

  3. CMOS Power Consumption An Example: Dynamic vs. Static In the hearing aid area, devices in the mW = + + = P P P P P P range is produced t t tot d dyn stat t t About 25% of the power, is static power = 2 + α f C V I V consumption, in a 130 nm low power technology L DD leakage DD In the 90 nm node, the static power will exceed dynamic and in the 65 nm node the static power will in the 65 nm node the static power will α = probability for switching dominate totally Source: Oticon Optimum V DD vs. V T Flat Minimum Operating Point moved towards P switch = = ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ = = ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ P P a a f f C C V V 2 P P a a f f C C V V 2 Switching CLK L DD Switching CLK L DD f CLK = constant f CLK = constant = ⋅ = ⋅ P I V P I V Leakage off DD Leakage off DD P TOTAL P TOTAL Power Power P P Leakage P Leakage P P Switching P Switching Performance Defines the Performance Defines the Operating Point Operating Point Supply voltage V DD (V) and V T Supply voltage V DD (V) and V T 3

  4. Reduce Power at all Levels System-on-Chip Off-Chip Connections have High Capacitive Load System Partitioning, Power- down Algorithm Complexity, Bit-optimization System Integration Architecture Parallelism, Pipelining will reduce off chip data transfer Ideally a Single Chip Solution Circuit/Logic Sizing, Logic Styles, Logic Design Technology T h l Threshold Voltage Th h ld V lt Reduced Power Consumption Large Savings on top and bottom Partitioning for Low Power Clock Gating and Power Down CLK Traditional design Module A High flexibility Low throughput g p Processor Main Enable A C Core Memory M High power - (in both processor and memory) Module B Partitioned design Enable B Partly flexible Module C Processor Main Lower power Core Memory Enable C Enable C - Dedicated hardware (less overhead) Accele- ASIC - Higher throughput and lower power rator structure Only active modules should be clocked! - Local busses and less data transfer - Lower power Clock gating is no solution for leakage! Distributed Distributed - Smaller memories, lower power memory memory 4

  5. Clock Gating or Input Disabling Power Modes combinational logic can be selectively turned off by the input disabling logic (IL), which consists of Active mode transparent latches with an enable signal EN. When units are executing useful calculation, EN makes the latches transparent thus permitting normal latches transparent, thus permitting normal operations. If there is no useful calculation, the Idle mode e.g. ready with task or clock gated latches retain their previous state and no transitions propagate through the inactive units. This method is called the guarded evaluation where both a theoretical framework and algorithms, which Sleep or standby mode for quick recovery. The automatically decide when the logic units performing useless calculations should be shut down, are memory content is often kept provided. Compared with the clock-gating technique, this method is less power effective because the power in the clock line is not saved. However, when t two functions share the same register but never work f ti h th i t b t k Power Down or Hibernate mode for maximum simultaneously, the register should remain active and the clock-gating methodology cannot be exploited. In power savings this case, by disabling the inputs to either of the two functions, it is still possible to reduce the power. Also, an input-disabling strategy is safer than clock- gating in terms of timing issues. Note that neither method is possible to avoid leakage power as it does not depend on signal transitions. Sleep Transistors to Control Leakage Sleep Control: Example Modules are turned of in standby or when not needed 0.5 B ilt Built on stacking effect t ki ff t x(n) 0 -0.5 -1 2000 4000 6000 8000 10000 V DD V DD V DD 1 T(x(n)) Module A Module B Module C 0.5 Low noise case: Advanced 0 0 2000 4000 6000 8000 10000 time, [ms] filtering not needed Turn Turn of B of C Pacemaker in a Approach: Flexible filtering (dual mode) Noisy Environment 5

  6. Flexible Wavelet Based Filter Structure A Dual-Mode Detector in UMC 0.13 μ m Normal mode (most of the time): Core power consumption reduced by 68 % Patient is not subjected to noise sources (e.g. resting) (simulated) at nominal V DD = 1.2 V Low complexity filtering 120 Dynamic From Noise Treshold Pulse nW Leakage Pace- Detector Function Gene- 100 Dual mode short-circuit maker rator 80 total Filtering 60 Generalized Wavelet Likelihood Filter 40 Ratio Test Alert mode: 20 0 Patient is subjected alert sleep Generalize d Wavelet to noise sources Likelihood Filter leakage is dominating Ratio Test (e.g. phys. active) Add-On Add-On Turned of in normal mode (sleep transistors) Example: ATMEL PicoPower Processor Intelligent Energy Management Run task as slow as possible by lowering V DD 340 uA in Active mode Performance 150 uA in Idle mode 0.65 uA in Sleep mode Time 0.1 uA in Power Down mode Performance Saves Energy Time Source ARM 6

  7. Dynamic Voltage scaling Low-Energy SoC Design Platform Pre-characterized for worst case Adapts to Decreased V DD conditions environmental and process variations process variations Frequency-voltage relationship is Frequency voltage relationship is Quadratic reduction of power stored in a LUT Linear increase of delay Goal Just in time processing Example: Dynamic Voltage Scaling Dynamic Voltage Scaling [V 2 ] 5 2 Scenario: 4 2 E a = 40 [J] 50 MHz A system needs to run 10 9 cycles within A system needs to run 10 9 cycles within 2.5 2 5 2 25 seconds 5 10 15 20 25 t[s] Minimize the power consumption [V 2 ] 5 2 E b = 32.5 [J] 4 2 50 MHz V DD [V] 5.0 4.0 2.5 2.5 2 25 MHz Energy per cycle [nJ] 40 25 10 5 10 15 20 25 t[s] [V 2 ] f max [MHz] 50 40 25 5 2 4 2 E c = 25 [J] 40 MHz 2.5 2 5 10 15 20 25 t[s] 7

  8. Repeaters to Reduce the Bus Delay Repeaters: Buffers Neglected × = × × × − 12 = 0.69 RC 0.69 2000 20 10 27.6 ns Consider a 20 mm long wire, 1 um wide R = 2 k Ω R C = 20 pF C R C × × = × × × × − 12 = 0.69 2 0.69 2 1000 10 10 13.8 ns 2 2 Buffer data R /2 R /2 C /2 C /2 = 200 Ω 200 Ω R eq R C out = 0.1 pF R C × × = 0.69 3 9.2 ns C in = 0.1 pF 3 3 R /3 R /3 R /3 C /3 C /3 C /3 Repeaters: Buffers included Repeaters: Minimum delay R/n R/n 35 Minimum delay C out C out C/n C/n C in C in C out C out C/n C/n C in C in 30 30 with two repeaters ith t t 25 n = number of repeaters 20 t p (ns) Buffers included R C 0.69 ( × n + 1)(( n + 1) × R + ) ( × + ( n + 1)( C + C )) 15 eq + + out in ( n 1) ( n 1) 10 Example: n = 2 Buffers excluded 5 R C 0 × × + × + × + = 0.69 3 (3 R ) ( 3 ( C C )) 0 1 2 3 4 5 eq out in 3 3 Number of repeaters 2000 20 = × × × + × + × × -12 = 0.69 3 (3 200 ) ( 3 0.2) 10 19.1 ns 3 3 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend