an efficient performance improvement method utilizing
play

An Efficient Performance Improvement Method Utilizing Specialized - PowerPoint PPT Presentation

An Efficient Performance Improvement Method Utilizing Specialized Functional Units in Behavioral Synthesis Tsuyoshi Sadakata, and Yusuke Matsunaga Kyusyu University, Japan Motivation Specialized Functional Units (SFUs) (e.g. Multiply-Acc


  1. An Efficient Performance Improvement Method Utilizing Specialized Functional Units in Behavioral Synthesis Tsuyoshi Sadakata, and Yusuke Matsunaga Kyusyu University, Japan

  2. Motivation • Specialized Functional Units (SFUs) (e.g. Multiply-Acc umulator) can be designed for specific operation patterns to achieve shorter delay and/or smaller area than casc aded basic functional units (e.g. Multiplier & Adder) • Introducing SFUs into behavioral synthesis can improve synthesis results • Because SFUs are less flexible for resource sharing , utilizing Specialized Functional Units in behavioral synth esis considering performance and area trade-off is a co mplicated problem 2

  3. Related Works • Integer Linear Programming based Methods – Landwehr et al, ``Oscar: optimum simultaneous schedulin g, allocation and resource binding based on integer progr amming’’, EuroDAC94 – Marwedel et al., ``Built-in chaining: Introducing complex c omponents into architectural synthesis’’, ASPDAC97 Long computational time can be required for large problems • Heuristic Methods – Corazao et al., ``Performance optimization using template mapping for datapath-intensive high-level synthesis’’, IEE E Trans. on CAD96 – Bringmann et al., ``Cross-level hierarchical high-level synt hesis’’, DATE98 3 Maximizing performance ignoring the increase of resources

  4. Proposed Method • A heuristic method utilizing SFUs for a simultaneo us Module Selection, Functional Unit Allocation, an d Scheduling problem considering performance /a rea trade-off – Constraint: clock cycle time & total functional unit area – Objective: minimize # of clock cycles – Approach 1. enumerate several feasible solutions at Module Selection 2. solve other sub-problems for each solution of Module Selection • Main Contribution Proposal of a novel heuristic Module Selection algorithm to restrict enumerated solutions effectively 4

  5. Module Selection Sub-Problem • Enumerate several feasible Module Set Vectors satisf ying clock cycle time & total functional unit area constra int Module Set Vector (MSV) = K ( , , , ) msv n n n 1 2 | FU | : a set of functional unit types FU : selected # of th functional unit type n i i [ ] : notation for th element ( ) msv i i n i Feasible Module Set Vector (FMSV) • Synthesis target can be implemented with the msv • The msv satisfies given constraint Inclusion Relation between MSVs ′ ⇔ is included in msv ms v 5 ∀ ′ = ≤ L 1 , 2 , , | |, [ ] [ ] i FU msv i ms v i

  6. Proposed Module Selection Algorithm • Only maximal FMSVs are enumerated – maximal FMSV: no other FMSV includes the msv Only FMSVs close to constraint boundary border are enumerated • maximal FMSVs are divided into several groups based on unit FMSVs = ⎧ 0 ( [ ] 0 ) msv i – unit FMSV: = maximal ⎨ [ ] msv i ≥ unit ⎩ 1 ( [ ] 1 ) msv i maximal For each group, minimum # of cycles is estimated with only unit FMSV # of cycles From a unit FMSV Estimated Result obtained by with the best value As Soon As Possible estimated value, Scheduling constant number of Total area of unit maximal FMSVs are FMSV 6 enumerated Total area Constraint

  7. Experiment • Effect of utilizing SFU is evaluated in two ways – ALL: a heuristic method that enumerated all maximal FMSVs – OUR: a heuristic method with the proposed algorithm • Synthesis Target – bdist2(# of operations: 43, MediaBench:MPEG2 Encoder) – fdct(# of operations: 138, MediaBench:JPEG Encoder) • Functional Unit Library – Basic functional units (e.g. adder, multiplier) – SFU • Carry-Save Adder based construction algorithm for addition based o perations (provided by Synopsys Module Compiler) – All units were synthesized with Synopsys Module Compiler unde r maximum delay constraint 3 ns or 6 ns with a cell library for HIT ACHI 0.18um CMOS process technology provided from VDEC • Constant number for the enumeration of maximal FMSV s with the proposed algorithm – 1,000 7

  8. Experimental Results # of clock cycles # of clock cycles (bdist2, clock cycle time constraint: 6ns) (fdct, clock cycle time constraint: 6ns) 80 35 The result cannot be The result can be obtained without SFUs obtained with SFUs 70 30 60 25 50 # of cycles # of cycles 20 40 15 30 10 20 OUR with SFUs: OUR with SFU: 5 10 ave. 17.5%, max. 35.7% reduction ave. 10.4%, max. 15.9% reduction 0 0 120000 130000 140000 150000 160000 170000 180000 190000 200000 210000 220000 110000 120000 130000 140000 150000 160000 170000 180000 Total area constraint (um^2) Total area constraint (um^2) ALL without SFUs ALL with SFUs ALL without SFUs ALL with SFUs OUR without SFUs OUR with SFUs OUR without SFUs OUR with SFUs Computational Time Comparison ALL with SFUs: max. 7,588 sec (bdist2), max. 8,218 sec (fdct) 8 OUR with SFUs: max. 149 sec (bdist2), max. 857 sec (fdct)

  9. Conclusion • An efficient performance improvement method ut ilizing SFUs is proposed • Performance improvement under clock cycle tim e and total functional unit area constraint can be achieved in practical time with the proposed met hod • Experimental results show that utilizing specializ ed functional units has achieved 13.3% on avera ge, maximally 35.7% reduction of # of clock cycl es within 15 minutes 9

  10. 10 Thank you for your attention.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend