heterogeneous latch based asynchronous pipelines
play

Heterogeneous Latch-based Asynchronous Pipelines Girish - PowerPoint PPT Presentation

Heterogeneous Latch-based Asynchronous Pipelines Girish Venkataramani Tiberiu Chelcea Seth C. Goldstein Presenter: Tobias Bjerregaard April 10 th , 2008 ASYNC 2008 1 Outline Introduction Introduction Latch Selection


  1. Heterogeneous Latch-based Asynchronous Pipelines Girish Venkataramani Tiberiu Chelcea Seth C. Goldstein Presenter: Tobias Bjerregaard April 10 th , 2008 ASYNC 2008 1

  2. Outline • Introduction Introduction • • Latch Selection Algorithm • Experimental Results • Conclusions April 10 th , 2008 ASYNC 2008 2

  3. Motivation L a • Normally open latches are t Data attractive for bundled data + c h designs, e.g., Mousetrap, Ack [Singh, ICCD 01] C H/S Delay Ack Req + High-performance: Short critical path to open latch – Power hungry: Data glitches spill over to downstream stages April 10 th , 2008 ASYNC 2008 3

  4. Motivation L a t Data • Self-Resetting (SR) Latches + c h address the glitching Ack problem, [Chelcea, DAC 07] C H/S Delay – D-Latch closed during active Req computation (filters glitches) – D-Latch is opened just before stage computation stabilizes L a t Data + c + 2x improvement in energy-delay* h – 10% performance slowdown* Req C H/S Delay SR Ack * Mediabench suite, [Lee, Micro 97] April 10 th , 2008 ASYNC 2008 4

  5. Contributions • Build heterogeneous pipelines – Use D-latches for timing critical stages – Use SR-latches for the rest • Module Selection problem – What is timing critical? – When is an SR-latch warranted? • Automatic Latch Selection Algorithm – Experimental results: Heterogeneous pipelines have equivalent performance to D- latches and are more energy-efficient than either homogeneous D-latch or SR-latch pipelines April 10 th , 2008 ASYNC 2008 5

  6. Outline • Introduction • Latch Selection Algorithm Latch Selection Algorithm • • Experimental Results • Conclusions April 10 th , 2008 ASYNC 2008 6

  7. Latch Selection Algorithm • Objectives: Get best of both worlds – Performance of D-latches – Energy efficiency of SR-latches • Approach: Balance the use of SR-latches – Too many � bigger and slower designs – Too few � high energy consumption • Algo properties: Three heuristics used to track timing criticality and estimate effect of datapath glitches April 10 th , 2008 ASYNC 2008 7

  8. Power Heuristics = glitch • Data glitches are = stage proportional to the datapath sr fanout – Use SR-latches if fanout >= 2 • Protect computation- intensive stages sr sr – Assign SR-latches to inputs – Bit-operations on datapath used to estimate computation intensiveness * April 10 th , 2008 ASYNC 2008 8

  9. Timing Criticality • SR-latch controllers introduce delay when opening latches – Use D-latch if stage is timing critical • Determine the system’s critical stages using the Global Critical Path (GCP), [Venkataramani, DAC 07] April 10 th , 2008 ASYNC 2008 9

  10. Timing Analysis • Analysis produces steady-state event firing times – Events are handshake signal transitions – Behaviors are dependence relations between events • Cycle time: Time difference between an event recurrence • Alternative representation of cycle time – Set of slack values: time difference between input events – Global Critical Path (GCP): longest zero-slack path – Global slack: Timing budget for GCP tolerance • How much can stage be slowed without changing GCP April 10 th , 2008 ASYNC 2008 10

  11. Event Slack • Use concept of Time Separation of Events (TSEs) to compute slack, GCP and global slack e 2 e 1 e 1 0 4 fires last-arrival input e 2 Behavior, b e 3 fires fires e 3 Slack(e 2 , b) = 4 Slack(e 1 , b) = 0 5 9 Time 0-slack input is locally critical [Fields, ISCA 01] April 10 th , 2008 ASYNC 2008 11

  12. Global Critical Path (GCP) • GCP is longest path of zero-slack input events, [Venkataramani,DAC 07] – Equivalent to the critical cycle • Bottom-Up computation of Cycle time: – Length of GCP cycle = cycle time GCP is the sequential critical path of the system. It represents the primary performance bottleneck April 10 th , 2008 ASYNC 2008 12

  13. Global Slack • Minimum cumulative b 1 GCP slack to the GCP 0 2 0 b 2 – If event is on the GCP 0 1 1 • GSlack(b 4 ) = GSlack(b 5 ) = 0 b 3 b 4 – Otherwise: 0 • GSlack(b 2 ) = 1 4 • GSlack(b 3 ) = 4 b 5 • GSlack(b 1 ) = Min(0+4, 2+1) = 3 e 5 Global slack is a measure of how much a behavior can be delayed without affecting global performance April 10 th , 2008 ASYNC 2008 13

  14. Timing Criticality Heuristic • Let ∆ sr be delay overhead introduced by SR-latches • Iterative algorithm: Assign an SR-latch when global slack is larger than ∆ sr – Update timing – Repeat; look for more opportunities April 10 th , 2008 ASYNC 2008 14

  15. Latch Selection Algorithm Overview G=(V,E) v best = stage in (V – V sr ) with most GSlack Add all v in V to V sr , if Fanout(v) >= 2 Is Add all v in V to V sr , if No V sr GSlack(v best ) > ∆ sr ? (v,u) is in E and BitOps(u) >= BO max Yes Update GSlack Add v best to V sr Complexity: O(|V||E| + |V| 2 ) April 10 th , 2008 ASYNC 2008 15

  16. Outline • Introduction • Latch Selection Algorithm • Experimental Results Experimental Results • • Conclusions April 10 th , 2008 ASYNC 2008 16

  17. Experimental Setup • Implemented latch selection algorithm within CASH, a compiler synthesizing 4-phase bundled circuits from C [IWLS 04] • Applied on 15 Mediabench kernels [Lee, Micro 97] • Circuits mapped to [180nm/2V] STMicro standard-cell library • Synopsys DC used to estimate energy, Modelsim used for gate-level timing estimation April 10 th , 2008 ASYNC 2008 17

  18. Impact of Heuristics 100% D-Latch Ops 90% Gslack 80% Compute-intensive 70% Fanout % Contribution 60% SR-Latch Stages 50% 40% 30% 20% 10% 0% K3.g721_d K4.g721_e K9.jpeg_d K10.jpeg_e K14.pgp_d K15.pgp_e K1.adpcm_d K2.adpcm_e K5.gsm_d K6.gsm_d K7.gsm_e K8.gsm_e K11.mpeg2_d K12.mpeg2_d K13.mpeg2_e Combined effect of heuristics contributes to energy efficiency April 10 th , 2008 ASYNC 2008 18

  19. April 10 th , 2008 Ratio to D-Latch 0.5 1.5 2.5 0 1 2 3 K1.adpcm_d K2.adpcm_e K3.g721_d K4.g721_e Energy-Delay K5.gsm_d K6.gsm_d K7.gsm_e K8.gsm_e ASYNC 2008 K9.jpeg_d K10.jpeg_e K11.mpeg2_d K12.mpeg2_d K13.mpeg2_e Heterogeneous SR-Latch K14.pgp_d K15.pgp_e GM 19

  20. April 10 th , 2008 Ratio to D-Latch 0.6 0.7 0.8 0.9 1.1 1 End-to-End Execution Time K1.adpcm_d K2.adpcm_e K3.g721_d K4.g721_e K5.gsm_d K6.gsm_d K7.gsm_e K8.gsm_e ASYNC 2008 K9.jpeg_d K10.jpeg_e K11.mpeg2_d K12.mpeg2_d K13.mpeg2_e Heterogeneous SR-Latch K14.pgp_d K15.pgp_e GM 20

  21. Outline • Introduction • Latch Selection Algorithm • Experimental Results • Conclusions Conclusions • April 10 th , 2008 ASYNC 2008 21

  22. Conclusions • D-latches are power-hungry and SR-latches are slow for bundled-data pipelines • Heterogeneous latch selection algorithm – Global slack to guide timing-critical selection – Simple heuristics to guide power-critical selection • Heterogeneous latch pipelines are more energy-efficient than either homogeneous D- latch or homogeneous SR-latch pipelines April 10 th , 2008 ASYNC 2008 22

  23. Thank You! Questions? April 10 th , 2008 ASYNC 2008 23

  24. Self-Resetting (SR) Latches [Chelcea, DAC 07] trigger Area and control-path power En SR cntrl overhead Dout Timing overhead D-latch Din Done Benefit: Datapath power C savings April 10 th , 2008 ASYNC 2008 24

  25. SR-latch behavior Data ready En+ Open latches to pass data EnSR+ When data latched Done+ En- close the latches EnSR- STG specification Done- [Chelcea, DAC 07] • Eliminate glitches: – open only after data is ready – close as soon as data latched • Eliminate overheads: – open before handshake starts April 10 th , 2008 ASYNC 2008 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend