lecture 2 i lecture 2 i
play

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming - PowerPoint PPT Presentation

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming Hsie-Chia Chang E-mail : hcchang@mail.nctu.edu.tw Fall 2006 Outline Outline Pipelining of FI R Digital filters Data-Broadcast Structures


  1. Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming 張錫嘉 Hsie-Chia Chang E-mail : hcchang@mail.nctu.edu.tw Fall 2006

  2. Outline Outline � Pipelining of FI R Digital filters – Data-Broadcast Structures – Fine-Grain Pipelining � Parallel Processing � Pipelining and Parallel Processing for Low Power � Retiming – Definitions and Properties – Solving Systems of Inequalities – Retiming Techniques • Cutset Retiming & Pipelining • Retiming for Clock Period Minimization • Retiming for Register Minimization Optimized Application-Specific I ntegrated Systems 2

  3. I ntroduction I ntroduction – If some real-time application requires a faster input rate, the critical path can be reduced by either pipelining or parallel processing Optimized Application-Specific I ntegrated Systems 3

  4. Pipelining & Parallel Processing (1/ 2) Pipelining & Parallel Processing (1/ 2) � Pipelining – Reduce the effective critical path by introducing pipelining latches along the critical datapath – Without any pipelining latches, the critical path can be reduced by � Parallel processing – Increase the sampling by replicating hardware so that inputs can be processed in parallel; outputs can be produced at the same time � This techniques applied in the non-recursive computations continue sending T sample ≠ T CLK T sample = T CLK Optimized Application-Specific I ntegrated Systems 4

  5. Pipelining & Parallel Processing (2/ 2) Pipelining & Parallel Processing (2/ 2) Example 2: Optimized Application-Specific I ntegrated Systems 5

  6. Pipelining of FI R Digital Filters Pipelining of FI R Digital Filters T Critical = T M + T A Schedule of Events in the Pipelined FIR Filter Optimized Application-Specific I ntegrated Systems 6

  7. Cutset Pipelining (1/ 2) Pipelining (1/ 2) Cutset � The speed is limited by the longest path between – any two latches – an input & a latch – a latch & an output – The input & the output � 2-level pipelined structure – The longest path can be reduced by suitably placing the pipelining latches in the architecture – In this system, at any time, 2 consecutive outputs are computed in an interleaved manner – Drawbacks • • Optimized Application-Specific I ntegrated Systems 7

  8. Cutset Pipelining (2/ 2) Pipelining (2/ 2) Cutset � Cutset � Feed-forward cutset cutset – We can arbitrarily place latches on + k D a feed-forward cutset of any FIR G2 +k D filter structure without affecting the G1 functionality of the algorithm + k D Optimized Application-Specific I ntegrated Systems 8

  9. Example 3.2.1 Example 3.2.1 Optimized Application-Specific I ntegrated Systems 9

  10. Data- -Broadcast Structures Broadcast Structures Data Optimized Application-Specific I ntegrated Systems 10

  11. Fine- -grain Pipelining grain Pipelining Fine Optimized Application-Specific I ntegrated Systems 11

  12. Parallel Processing Parallel Processing � Parallel processing are also referred to as block processing – Block size = no. of inputs processed in a clock cycle – For a 3-tap FRI filter, the duplicate hardware can be shown as: Block delay delay = + − + −  y ( 3 k ) ax ( 3 k ) bx ( 3 k 1 ) cx ( 3 k 2 ) = + − + −  y ( n ) ax ( n ) bx ( n 1 ) cx ( n 2 ) + = + + + −  y ( 3 k 1 ) ax ( 3 k 1 ) bx ( 3 k ) cx ( 3 k 1 )  + = + + + +  y ( 3 k 2 ) ax ( 3 k 2 ) bx ( 3 k 1 ) cx ( 3 k ) � I n MI MO, Optimized Application-Specific I ntegrated Systems 12

  13. Complete Parallel Processing Systems Complete Parallel Processing Systems – A serial-to-parallel converter – A parallel-to-serial converter Optimized Application-Specific I ntegrated Systems 13

  14. Why use Parallel Processing?? Why use Parallel Processing?? � Communication bounded – When the critical path is less than T communication , the I/O bound dominates and this system is communication bounded . – Pipelining can be used only to the extent such that the critical path is limited by the communication bound. – Once this is reached, pipelining can no longer increase the speed Optimized Application-Specific I ntegrated Systems 14

  15. Combined Pipelining & Parallel Processing Combined Pipelining & Parallel Processing – After combining M -level pipelining and L -level parallel processing, Optimized Application-Specific I ntegrated Systems 15

  16. CMOS Power Consumption (1/ 2) CMOS Power Consumption (1/ 2) � P total = P dynamic + P short-circuit + P static � Short circuit – current spikes � Static Power – leakage current Optimized Application-Specific I ntegrated Systems 16

  17. CMOS Power Consumption (2/ 2) CMOS Power Consumption (2/ 2) � Based on simple approximation & 1st-order analysis – Propagation delay ⋅ C V = charge 0 T ( ) pd − 2 k V V 0 t C charge the capacitance to be charged or discharged in a single clock cycle (along the critical path) V 0 、 V t the supply voltage 、 the threshold voltage K a function of technology parameters – Power consumption = ⋅ ⋅ 2 P C V f total 0 C total the total capacitance of the CMOS circuit f clock frequency of the circuit Optimized Application-Specific I ntegrated Systems 17

  18. Low Power Design Low Power Design � To reduce – Capacitances • Transistor/Gate C • Load C • Interconnects • External – Activity – Frequency – Power supply � Other issues – Off-chip connections have high capacitive load – System integration Optimized Application-Specific I ntegrated Systems 18

  19. Pipelining for Low Power (1/ 2) Pipelining for Low Power (1/ 2) � For an M-level pipelined architecture, – the critical path is reduced to 1/ M and the capacitance to be charged/discharged in a single cycle (C charge ) is also reduced to 1/ M � I f the same clock speed is maintained (f = 1/ T pd ), – only 1/M of the non-pipelined capacitance is required to be charged or discharged, which suggests voltage reduction β ⋅ V – Suppose the voltage can be reduced to , 0 ( ) = ⋅ β ⋅ ⋅ the power consumption becomes 2 P C V f pipelined total 0 = β ⋅ 2 P − non pipelined Optimized Application-Specific I ntegrated Systems 19

  20. Pipelining for Low Power (2/ 2) Pipelining for Low Power (2/ 2) – propagation delay of the original architecture – propagation delay of the pipelined architecture – setting the above two equations equal, the following quadratic equation can be obtained to solve β ( ) ( ) β ⋅ − = β ⋅ − 2 2 M V V V V 0 t 0 t Optimized Application-Specific I ntegrated Systems 20

  21. Example 3.4.1: Reduce Power by Pipelining Example 3.4.1: Reduce Power by Pipelining � Consider the following two FI R filters. x(n) x(n) m 1 m 1 m 1 D D D D D y(n) m 2 m 2 m 2 D D y(n) – What is the supply voltage of the pipelined architecture if the clock periods are identical? – What is the relative power consumption? Optimized Application-Specific I ntegrated Systems 21

  22. Solution Solution Optimized Application-Specific I ntegrated Systems 22

  23. Parallel Processing for Low Power (1/ 2) Parallel Processing for Low Power (1/ 2) � For an L-parallel architecture, – the charge capacitance remains the same, but the total capacitance (C total ) is increased L times � To maintain the same sample rate, – The clock speed is reduced to 1/L (f = 1/LT pd ), which means the C charge is charged or discharged L times longer. β ⋅ V – The supply voltage can be reduced to , 0 ( ) ( ) f the power consumption becomes = ⋅ ⋅ β ⋅ ⋅ 2 P L C V parallel total 0 L = β ⋅ 2 P − non parallel Optimized Application-Specific I ntegrated Systems 23

  24. Parallel Processing for Low Power (2/ 2) Parallel Processing for Low Power (2/ 2) – propagation delay of the original architecture – propagation delay of the parallel architecture – setting these two propagation delays equal, the following quadratic equation can be obtained to solve β ( ) ( ) β ⋅ − = β ⋅ − 2 2 L V V V V 0 t 0 t Optimized Application-Specific I ntegrated Systems 24

  25. Example 3.4.2: Reduce Power by Parallel Example 3.4.2: Reduce Power by Parallel � Consider the following two FI R filters, with critical paths denoted in dash lines respectively x(2k) x(n) D y(2k+1) D D D y(n) x(2k+1) D D y(2k) – What is the supply voltage of the parallel architecture? – What is the relative power consumption? Optimized Application-Specific I ntegrated Systems 25

  26. Solution Solution Optimized Application-Specific I ntegrated Systems 26

  27. Example 3.4.3 Example 3.4.3 � Area-efficient architecture Optimized Application-Specific I ntegrated Systems 27

  28. Summary Summary � I n pipelining & parallel processing, – M-level pipelining, – L-level parallel processing, – Combining M-level pipelining & L-level parallel processing, � For low power design, – Pipelining – Parallel Processing – Combining Pipelining and Parallel Processing Optimized Application-Specific I ntegrated Systems 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend