Evaluating Compiler Support for Complexity Effective Network - - PowerPoint PPT Presentation

evaluating compiler support for complexity effective
SMART_READER_LITE
LIVE PREVIEW

Evaluating Compiler Support for Complexity Effective Network - - PowerPoint PPT Presentation

Evaluating Compiler Support for Complexity Effective Network Processing Pradeep Rao and S.K. Nandy Computer Aided Design Laboratory. SERC, Indian Institute of Science. pradeep,nandy@cadl.iisc.ernet.in http://www.serc.iisc.ernet.in/cadl/ 7 th


slide-1
SLIDE 1

7th June 2003 Workshop on Complexity Effective Design 1

Evaluating Compiler Support for Complexity Effective Network Processing

Pradeep Rao and S.K. Nandy Computer Aided Design Laboratory. SERC, Indian Institute of Science.

pradeep,nandy@cadl.iisc.ernet.in http://www.serc.iisc.ernet.in/cadl/

slide-2
SLIDE 2

7th June 2003 Workshop on Complexity Effective Design 2

Outline

  • Why network processors (NP) ?

– Why complexity effective NPs ?

  • NP design issues
  • Statically scheduled processors for NPs

– Compiler optimizations

  • Classical
  • Superblock
  • Hyperblock

– Performance Data

slide-3
SLIDE 3

7th June 2003 Workshop on Complexity Effective Design 3

Network Processors

  • Why do we need network processors ?

– Significant time spent in protocol stack – Increasing data rates

  • Increased performance requirements

– New protocols and services

  • Software based functionality
  • Flexible (vs. ASIC)
  • Faster time to market
  • Players

– Cisco, Intel IXP, IBM PowerNP, Motorola (C-Port) C5, Broadcom, ClearWater ...

slide-4
SLIDE 4

7th June 2003 Workshop on Complexity Effective Design 4

Complexity Effective NPs

  • Complexity-Effective hardware

– Low design, verification and testing times – Impacts time to market – Low power

  • Fixed power budgets for line cards
  • Network enabled mobile devices

– Performance goals met ?

  • Performance

– Exploit parallelism – Push clock frequencies

slide-5
SLIDE 5

7th June 2003 Workshop on Complexity Effective Design 5

NP Design Issues

  • System Design:

Organization of memory, interconnection, processing element (PE) and its local memory …

  • Inadequate performance

data for the design of future network processors

slide-6
SLIDE 6

7th June 2003 Workshop on Complexity Effective Design 6

Static Scheduling for NPs

  • Keep hardware simple by offloading

complexity onto the compiler

  • The compiler has a ‘global’ view of the

program

  • Performance data for

– In-order superscalar (IOS) – VLIW

slide-7
SLIDE 7

7th June 2003 Workshop on Complexity Effective Design 7

Methodology

  • IMPACT Toolset (UIUC)
  • Architectures

– In-order Superscalar – VLIW

  • Compiler optimizations

– Classical – Superblock – Hyperblock

  • Applications

– Checksum computation: crc – Deficient round robin scheduling: drr – Shortest path computation: dijkstra – Diffie Hellman public key encryption/decryption: dh – Reed Solomon codec: reed_enc, reed_dec

slide-8
SLIDE 8

7th June 2003 Workshop on Complexity Effective Design 8

The Superblock

  • Essentially a trace with

single entry multiple exits

  • Reduces bookkeeping

required to support side entrances

  • Code motion with

compiler controlled speculation

  • General speculation model

for minimal hardware support.

slide-9
SLIDE 9

7th June 2003 Workshop on Complexity Effective Design 9

The Hyperblock

  • Adds predicated

execution for superblocks

slide-10
SLIDE 10

7th June 2003 Workshop on Complexity Effective Design 10

Application Characteristics

  • Op-code Frequencies

– 40% integer operations

  • Addition and shifts

account for > 80% ops

– SB optimizations do not change the op freq.

  • No additional stress on

resources

– HB optimizations reduce conditional branches by if- conversion.

  • Predicate instructions

account for 0-37%

slide-11
SLIDE 11

7th June 2003 Workshop on Complexity Effective Design 11

Application Characteristics…

  • Branch Statistics

– Avg. branch prediction accuracy: 92.32%, with < 9% deviation – Branch prediction accuracy for SB and HB are higher

slide-12
SLIDE 12

7th June 2003 Workshop on Complexity Effective Design 12

Application Characteristics…

  • Block Size

– Indicative of potential parallelism – BB Avg: 5 instructions – SB/HB Avg: 13 instructions

slide-13
SLIDE 13

7th June 2003 Workshop on Complexity Effective Design 13

Application Characteristics…

  • Cache Performance

– Effect of SB/HB on cache performance – D$ unaffected – I$, for equivalent cache sizes the miss rate increases by 40%

slide-14
SLIDE 14

7th June 2003 Workshop on Complexity Effective Design 14

Architectural Evaluation…

  • Speedup plots with perfect

caches for VLIW

  • Up to 2.4x speedup with

SB/HB optimization

  • Predication overhead at low

issue widths

  • Performance gain from HB

(over SB) at high issue < 8%

  • Leveling indicates decrease

in processor utilization

slide-15
SLIDE 15

7th June 2003 Workshop on Complexity Effective Design 15

Architectural Evaluation…

  • Effect of real cache

– Greater impact on VLIW than IOS – However, the performance benefit of IOS over VLIW is less than 1.8%, suggesting VLIW for complexity effective designs – Average network rates of 6.6Gbps @ 500MHz for drr

7.4% 5.6% HB 6.8% 5.6% SB 1.08% 1.06% BB VLIW IOS

slide-16
SLIDE 16

7th June 2003 Workshop on Complexity Effective Design 16

Frequency Effects

  • Increase in memory/FU

latency (empirical)

  • Increase in performance

not commensurate with frequency increase

– Performance improvement with doubled frequency (B- M1) is < 37%, (M1-M2) < 31%

  • Need for efficient latency

hiding techniques

– (SMT, TCP) ?

slide-17
SLIDE 17

7th June 2003 Workshop on Complexity Effective Design 17

Conclusions

  • This study provides performance data for statically

scheduled processors, for networking applications

  • Operation frequencies differ from SPEC and

Media applications

– Organization of FU’s

  • High static branch prediction rates

– Make static scheduling attractive for networking applications

  • Speedup due to SB and HB optimizations can be

as high as 2.4

slide-18
SLIDE 18

7th June 2003 Workshop on Complexity Effective Design 18

Conclusions…

  • HB optimizations improve performance by < 8%

– The additional complexity might not be justified

  • The performance advantage of an IOS over VLIW

is less than 1.8%

– VLIW being CE might be more attractive

  • Simulation results show average network rates of

6.6Gbps for drr, at 500MHz for 8-issue VLIW with SB optimization

  • Need to exploit packet level parallelism
slide-19
SLIDE 19

7th June 2003 Workshop on Complexity Effective Design 19

Thank You