An Efficient Performance Improvement Method Utilizing Specialized - - PowerPoint PPT Presentation

an efficient performance improvement method utilizing
SMART_READER_LITE
LIVE PREVIEW

An Efficient Performance Improvement Method Utilizing Specialized - - PowerPoint PPT Presentation

An Efficient Performance Improvement Method Utilizing Specialized Functional Units in Behavioral Synthesis Tsuyoshi Sadakata, and Yusuke Matsunaga Kyusyu University, Japan Motivation Specialized Functional Units (SFUs) (e.g. Multiply-Acc


slide-1
SLIDE 1

An Efficient Performance Improvement Method Utilizing Specialized Functional Units in Behavioral Synthesis

Tsuyoshi Sadakata, and Yusuke Matsunaga Kyusyu University, Japan

slide-2
SLIDE 2

2

Motivation

  • Specialized Functional Units (SFUs) (e.g. Multiply-Acc

umulator) can be designed for specific operation patterns to achieve shorter delay and/or smaller area than casc aded basic functional units (e.g. Multiplier & Adder)

  • Introducing SFUs into behavioral synthesis can improve

synthesis results

  • Because SFUs are less flexible for resource sharing,

utilizing Specialized Functional Units in behavioral synth esis considering performance and area trade-off is a co mplicated problem

slide-3
SLIDE 3

3

Related Works

  • Integer Linear Programming based Methods

– Landwehr et al, ``Oscar: optimum simultaneous schedulin g, allocation and resource binding based on integer progr amming’’, EuroDAC94 – Marwedel et al., ``Built-in chaining: Introducing complex c

  • mponents into architectural synthesis’’, ASPDAC97
  • Heuristic Methods

– Corazao et al., ``Performance optimization using template mapping for datapath-intensive high-level synthesis’’, IEE E Trans. on CAD96 – Bringmann et al., ``Cross-level hierarchical high-level synt hesis’’, DATE98

Long computational time can be required for large problems Maximizing performance ignoring the increase of resources

slide-4
SLIDE 4

4

Proposed Method

  • A heuristic method utilizing SFUs for a simultaneo

us Module Selection, Functional Unit Allocation, an d Scheduling problem considering performance /a rea trade-off

– Constraint: clock cycle time & total functional unit area – Objective: minimize # of clock cycles – Approach

  • 1. enumerate several feasible solutions at Module Selection
  • 2. solve other sub-problems for each solution of Module Selection
  • Main Contribution

Proposal of a novel heuristic Module Selection algorithm to restrict enumerated solutions effectively

slide-5
SLIDE 5

5

Module Selection Sub-Problem

  • Enumerate several feasible Module Set Vectors satisf

ying clock cycle time & total functional unit area constra int

) ( element th for notation : ] [ unit type functional th

  • f

# selected : unit types functional

  • f

set a : ) , , , (

| | 2 1 i i FU

n i i msv i n FU n n n msv K =

] [ ] [ |, | , , 2 , 1 in included is i v ms i msv FU i v ms msv ′ ≤ = ⇔ ′

L

Module Set Vector (MSV) Inclusion Relation between MSVs Feasible Module Set Vector (FMSV)

  • Synthesis target can be implemented with the msv
  • The msv satisfies given constraint
slide-6
SLIDE 6

6

Proposed Module Selection Algorithm

  • Only maximal FMSVs are enumerated

– maximal FMSV: no other FMSV includes the msv

  • maximal FMSVs are divided into several groups based
  • n unit FMSVs

– unit FMSV:

⎩ ⎨ ⎧ ≥ = = ) 1 ] [ ( 1 ) ] [ ( ] [ i msv i msv i msv

maximal maximal unit

Only FMSVs close to constraint boundary border are enumerated For each group, minimum # of cycles is estimated with only unit FMSV

Total area # of cycles Total area of unit FMSV Constraint Estimated value Result obtained by As Soon As Possible Scheduling

From a unit FMSV with the best estimated value, constant number of maximal FMSVs are enumerated

slide-7
SLIDE 7

7

Experiment

  • Effect of utilizing SFU is evaluated in two ways

– ALL: a heuristic method that enumerated all maximal FMSVs – OUR: a heuristic method with the proposed algorithm

  • Synthesis Target

– bdist2(# of operations: 43, MediaBench:MPEG2 Encoder) – fdct(# of operations: 138, MediaBench:JPEG Encoder)

  • Functional Unit Library

– Basic functional units (e.g. adder, multiplier) – SFU

  • Carry-Save Adder based construction algorithm for addition based o

perations (provided by Synopsys Module Compiler)

– All units were synthesized with Synopsys Module Compiler unde r maximum delay constraint 3 ns or 6 ns with a cell library for HIT ACHI 0.18um CMOS process technology provided from VDEC

  • Constant number for the enumeration of maximal FMSV

s with the proposed algorithm

– 1,000

slide-8
SLIDE 8

8

Experimental Results

5 10 15 20 25 30 35 110000 120000 130000 140000 150000 160000 170000 180000 Total area constraint (um^2) # of cycles ALL without SFUs ALL with SFUs OUR without SFUs OUR with SFUs

# of clock cycles (bdist2, clock cycle time constraint: 6ns) # of clock cycles (fdct, clock cycle time constraint: 6ns)

10 20 30 40 50 60 70 80 120000 130000 140000 150000 160000 170000 180000 190000 200000 210000 220000 Total area constraint (um^2) # of cycles ALL without SFUs ALL with SFUs OUR without SFUs OUR with SFUs

OUR with SFUs:

  • ave. 17.5%, max. 35.7% reduction

The result can be

  • btained with SFUs

The result cannot be

  • btained without SFUs

Computational Time Comparison ALL with SFUs: max. 7,588 sec (bdist2), max. 8,218 sec (fdct) OUR with SFUs: max. 149 sec (bdist2), max. 857 sec (fdct)

OUR with SFU:

  • ave. 10.4%, max. 15.9% reduction
slide-9
SLIDE 9

9

Conclusion

  • An efficient performance improvement method ut

ilizing SFUs is proposed

  • Performance improvement under clock cycle tim

e and total functional unit area constraint can be achieved in practical time with the proposed met hod

  • Experimental results show that utilizing specializ

ed functional units has achieved 13.3% on avera ge, maximally 35.7% reduction of # of clock cycl es within 15 minutes

slide-10
SLIDE 10

10

Thank you for your attention.