Towards an Efficient Switch Architecture for High-Radix Switches G. - - PowerPoint PPT Presentation

towards an efficient switch architecture for high radix
SMART_READER_LITE
LIVE PREVIEW

Towards an Efficient Switch Architecture for High-Radix Switches G. - - PowerPoint PPT Presentation

Motivation PCIQ Switch Architecture Summary Towards an Efficient Switch Architecture for High-Radix Switches G. Mora 1 J. Flich 1 J. Duato 1 . Lpez 1 E. Baydal 1 P O. Lysne 2 1 Department of Computer Engineering Technical University of


slide-1
SLIDE 1

Motivation PCIQ Switch Architecture Summary

Towards an Efficient Switch Architecture for High-Radix Switches

  • G. Mora1
  • J. Flich1
  • J. Duato1

P . López1

  • E. Baydal1
  • O. Lysne2

1Department of Computer Engineering

Technical University of Valencia, Spain

2Simula Lab

Oslo, Norway

ACM/IEEE Symposium on Architectures for Networking and Communications Systems 2006

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-2
SLIDE 2

Motivation PCIQ Switch Architecture Summary

Outline

1

Motivation

2

PCIQ Switch Architecture Description Evaluation Enhancements and Cost Analysis

3

Summary

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-3
SLIDE 3

Motivation PCIQ Switch Architecture Summary

Motivation

HPC requires efficient Interconnection Networks. The Interconnection Network efficiency largely depends on the Switch design. How to build those Switches?

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-4
SLIDE 4

Motivation PCIQ Switch Architecture Summary

Motivation

HPC requires efficient Interconnection Networks. The Interconnection Network efficiency largely depends on the Switch design. How to build those Switches?

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-5
SLIDE 5

Motivation PCIQ Switch Architecture Summary

Motivation

HPC requires efficient Interconnection Networks. The Interconnection Network efficiency largely depends on the Switch design. How to build those Switches?

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-6
SLIDE 6

Motivation PCIQ Switch Architecture Summary

Motivation

HPC requires efficient Interconnection Networks. The Interconnection Network efficiency largely depends on the Switch design. How to build those Switches? Specially, how to use the pin bandwidth of such Switches?

Low-radix switches with wide channels. High-radix switches with narrow channels.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-7
SLIDE 7

Motivation PCIQ Switch Architecture Summary

Motivation

HPC requires efficient Interconnection Networks. The Interconnection Network efficiency largely depends on the Switch design. How to build those Switches? Specially, how to use the pin bandwidth of such Switches?

Low-radix switches with wide channels. High-radix switches with narrow channels.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-8
SLIDE 8

Motivation PCIQ Switch Architecture Summary

Switch Solutions to build large networks

Networks Using Low-Radix Switches with Wide Channels ↑Network Latency ↑Network Cost ↑Power Consumption Networks using High-Radix Switches with Narrow Channels ↓Network Latency ↓Network Cost ↓Power Consumption

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-9
SLIDE 9

Motivation PCIQ Switch Architecture Summary

Cost of Building a Switch

We need to keep a high switch efficiency with an affordable cost. The cost depends on:

Memory resources. Arbiter logic. Internal connection logic.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-10
SLIDE 10

Motivation PCIQ Switch Architecture Summary

Cost of Building a Switch

We need to keep a high switch efficiency with an affordable cost. The cost depends on:

Memory resources. Arbiter logic. Internal connection logic.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-11
SLIDE 11

Motivation PCIQ Switch Architecture Summary

Head of Line Blocking

It largely affects the switch efficiency. It appears when a packet at the head of a queue is blocked and packets behind requesting free output ports.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-12
SLIDE 12

Motivation PCIQ Switch Architecture Summary

Head of Line Blocking

It largely affects the switch efficiency. It appears when a packet at the head of a queue is blocked and packets behind requesting free output ports.

HOL!

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-13
SLIDE 13

Motivation PCIQ Switch Architecture Summary

Switch Organizations

OQ - Output Queueing

Output Queueing - N × N Switch

XBar

Memory requirements ∼ N No HOL Speedup of N

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-14
SLIDE 14

Motivation PCIQ Switch Architecture Summary

Switch Organizations

IQ - Input Queueing

Input Queued N × N Switch

XBar

Memory requirements ∼ N No Speedup HOL limits max. througput at 58%

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-15
SLIDE 15

Motivation PCIQ Switch Architecture Summary

Switch Organizations

IQ - Input Queueing with VOQ

Input Queued N × N Switch with VOQ

XBar

No HOL at switch level No Speedup Memory requirements ∼ N2

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-16
SLIDE 16

Motivation PCIQ Switch Architecture Summary

Switch Organizations

CIOQ - Combined Input-Output Queueing

Combined Input-Output Queued N × N Switch

XBar

Memory requirements ∼ 2N HOL at switch level

  • Max. Speedup of 2 or

3

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-17
SLIDE 17

Motivation PCIQ Switch Architecture Summary

Switch Organizations

BC - Buffered Crossbar

Buffered Crossbar N × N Switch No HOL Low cost arbiters Memory requirements ∼ N2

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-18
SLIDE 18

Motivation PCIQ Switch Architecture Summary

Switch Organizations

HC - Hierarchical Crossbar

p−Hierarchical Crossbar N × N Switch

CIOQ

It is an intermediate solution between CIOQ and BC. A Buffered Crossbar (with N ports) is subtituted by smaller switches (with p ports). Memory requirements ∼ N2

p

Speedup

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-19
SLIDE 19

Motivation PCIQ Switch Architecture Summary

Switch Organizations

None of these architectures scale (because high-cost or low switch efficiency). We need a better proposal for high-radix switches!

A proposal that scales, that achieves high switch efficiency, and that eliminates HOL blocking problem.

PCIQ fulfills all these requirements.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-20
SLIDE 20

Motivation PCIQ Switch Architecture Summary

Switch Organizations

None of these architectures scale (because high-cost or low switch efficiency). We need a better proposal for high-radix switches!

A proposal that scales, that achieves high switch efficiency, and that eliminates HOL blocking problem.

PCIQ fulfills all these requirements.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-21
SLIDE 21

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Outline

1

Motivation

2

PCIQ Switch Architecture Description Evaluation Enhancements and Cost Analysis

3

Summary

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-22
SLIDE 22

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Starting Point

To be aware of the implementation cost we will present the PCIQ in a constructive way. We will monitor aspects such as Switch Efficiency, Memory Requirements and Crossbar Complexity. We start with CIOQ switch organization without speedup.

Output links Crossbar Input links

memory memory Routing & arbitration unit Input Output

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-23
SLIDE 23

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

First Modification: Increasing Read Bandwidth

Output links Crossbar Input links

memory memory Routing & arbitration unit

CIOQ

Switch Efficiency Memory Requeriments Crossbar Complexity

Input Output

In order to increase the switch efficiency, let’s increase the read bandwith of input memories.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-24
SLIDE 24

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Doubling SRAM Read Bandwidth

How to double the SRAM read bandwidth?

Split SRAM into two independent modules.

Doubles silicon area (and SRAM size). Requires extra logic to select the SRAM.

Implement two read ports.

Increases silicon area by 25%. With full-custom designs or HMA techniques this extra area can be reduced.

Output links Crossbar Input links

memory memory Routing & arbitration unit Input Output

Output links Crossbar Input links

memory memory Routing & arbitration unit Input Output

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-25
SLIDE 25

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

First Modification: Increasing Read Bandwidth

Output links Crossbar Input links

memory memory Routing & arbitration unit

Output links Crossbar Input links

memory memory Routing & arbitration unit

CIOQ−2rp CIOQ

Switch Efficiency Memory Requeriments Crossbar Complexity

Input Output Input Output

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-26
SLIDE 26

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Second Modification: Splitting Crossbars

We need to keep constant arbiter complexity. We split the crossbar into two separated crossbars.

Output links Crossbar Input links

memory memory Routing & arbitration unit Input Output

Crossbar Output links Input links

Input memories Output memories arbitration unit Routing &

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-27
SLIDE 27

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Second Modification: Splitting Crossbars

Output links Crossbar Input links

memory memory Routing & arbitration unit

Output links Crossbar Input links

memory memory Routing & arbitration unit

Crossbar Output links Input links

Input memories Output memories

PC CIOQ−2rp CIOQ

Switch Efficiency Memory Requeriments Crossbar Complexity

arbitration unit Routing & Input Output Input Output

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-28
SLIDE 28

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Third Modification: Removing the Output Memories

Crossbar Output links Input links

Input memories arbitration unit Routing &

Each read port is used to forward packets to a different set of output links. Output memories receive data at the link rate. Output memories can be removed. It compensates the extra cost for dual-ported memory.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-29
SLIDE 29

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

PCIQ Switch Organization

Overview

Output links Crossbar Input links

memory memory Routing & arbitration unit

Output links Crossbar Input links

memory memory Routing & arbitration unit

Crossbar Output links Input links

Input memories Output memories

Crossbar Output links Input links

Input memories

Switch Efficiency Memory Requeriments Crossbar Complexity

PCIQ PC CIOQ−2rp CIOQ

arbitration unit Routing & arbitration unit Routing & Input Output Input Output

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-30
SLIDE 30

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

PCIQ Switch Organization

Routing and Flow Control

Crossbar Output links Input links

Input memories arbitration unit Routing &

Packets must be stored in the correct queue. Credit-based flow control at memory level. Xon/Xoff flow control at queue level.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-31
SLIDE 31

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

PCIQ Switch Organization

Arbiter

Two identical arbiters are required. One per crossbar and associated with one read port from each input memory. Implemented as an hierarchical round-robin arbiter. Arbiter (not new)

  • 1st. level: qx1 rr-arbiter

(among q queues).

  • 2nd. level: Nx1 rr-arbiter

(among N memories). It arbitrates asynchronously at packet level. So, efficiency increases with traffic load.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-32
SLIDE 32

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

PCIQ Switch Organization

Arbiter

Two identical arbiters are required. One per crossbar and associated with one read port from each input memory. Implemented as an hierarchical round-robin arbiter. Arbiter (not new)

  • 1st. level: qx1 rr-arbiter

(among q queues).

  • 2nd. level: Nx1 rr-arbiter

(among N memories). It arbitrates asynchronously at packet level. So, efficiency increases with traffic load.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-33
SLIDE 33

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

PCIQ Switch Organization

Arbiter Efficiency

Arbiter efficiency increases with traffic load, specially for asymmetrical crossbars. N × N Crossbar (Symmetrical)

NxN

8 inputs requesting 8 outputs (Ratio 1 to 1) 7 inputs 7 outputs (Ratio 1 to 1) 6 inputs 6 outputs (Ratio 1 to 1) 5 inputs 5 outputs (Ratio 1 to 1)

N × N

2 Crossbar (Asymmetrical)

NxN/2

8 inputs requesting 4 outputs (Ratio 2 to 1) 7 inputs 3 outputs (Ratio 2.33 to 1) 6 inputs 2 outputs (Ratio 3 to 1) 5 inputs 1 outputs (Ratio 5 to 1)

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-34
SLIDE 34

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

PCIQ Switch Organization

Arbiter Efficiency

Arbiter efficiency increases with traffic load, specially for asymmetrical crossbars. N × N Crossbar (Symmetrical)

NxN

8 inputs requesting 8 outputs (Ratio 1 to 1) 7 inputs 7 outputs (Ratio 1 to 1) 6 inputs 6 outputs (Ratio 1 to 1) 5 inputs 5 outputs (Ratio 1 to 1)

N × N

2 Crossbar (Asymmetrical)

NxN/2

8 inputs requesting 4 outputs (Ratio 2 to 1) 7 inputs 3 outputs (Ratio 2.33 to 1) 6 inputs 2 outputs (Ratio 3 to 1) 5 inputs 1 outputs (Ratio 5 to 1)

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-35
SLIDE 35

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

PCIQ Switch Organization

Arbiter Efficiency

Arbiter efficiency increases with traffic load, specially for asymmetrical crossbars. N × N Crossbar (Symmetrical)

NxN

8 inputs requesting 8 outputs (Ratio 1 to 1) 7 inputs 7 outputs (Ratio 1 to 1) 6 inputs 6 outputs (Ratio 1 to 1) 5 inputs 5 outputs (Ratio 1 to 1)

N × N

2 Crossbar (Asymmetrical)

NxN/2

8 inputs requesting 4 outputs (Ratio 2 to 1) 7 inputs 3 outputs (Ratio 2.33 to 1) 6 inputs 2 outputs (Ratio 3 to 1) 5 inputs 1 outputs (Ratio 5 to 1)

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-36
SLIDE 36

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

PCIQ Switch Organization

Arbiter Efficiency

Arbiter efficiency increases with traffic load, specially for asymmetrical crossbars. N × N Crossbar (Symmetrical)

NxN

8 inputs requesting 8 outputs (Ratio 1 to 1) 7 inputs 7 outputs (Ratio 1 to 1) 6 inputs 6 outputs (Ratio 1 to 1) 5 inputs 5 outputs (Ratio 1 to 1)

N × N

2 Crossbar (Asymmetrical)

NxN/2

8 inputs requesting 4 outputs (Ratio 2 to 1) 7 inputs 3 outputs (Ratio 2.33 to 1) 6 inputs 2 outputs (Ratio 3 to 1) 5 inputs 1 outputs (Ratio 5 to 1)

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-37
SLIDE 37

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

PCIQ Switch Organization

Extending PCIQ to More than Two Subcrossbars

PCIQ can be further partitioned in more than 2 subcrossbars.

Crossbar Output links Input links

Input memories arbitration unit Routing &

Output links Input links

memories Input

Crossbars

unit Routing & arbitration

PCIQ is a new family of switch architectures. Fits the gap between CIOQ and BC.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-38
SLIDE 38

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

PCIQ Switch Organization

Extending PCIQ to More than Two Subcrossbars

PCIQ can be further partitioned in more than 2 subcrossbars.

Crossbar Output links Input links

Input memories arbitration unit Routing &

Output links Input links

memories Input

Crossbars

unit Routing & arbitration

PCIQ is a new family of switch architectures. Fits the gap between CIOQ and BC.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-39
SLIDE 39

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Evaluation of PCIQ

Configurations Analyzed

Switch 24-ports Organizations Basic CIOQ PCIQ - 2-xbar PCIQ - 4-xbar HC (p = 12) Packets Size: 256 bytes

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-40
SLIDE 40

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Evaluation of PCIQ

Configurations Analyzed

Switch 24-ports Organizations Basic CIOQ PCIQ - 2-xbar PCIQ - 4-xbar HC (p = 12) Packets Size: 256 bytes

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-41
SLIDE 41

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Evaluation of PCIQ

Configurations Analyzed

Switch 24-ports Organizations Basic CIOQ PCIQ - 2-xbar PCIQ - 4-xbar HC (p = 12) Packets Size: 256 bytes

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-42
SLIDE 42

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Evaluation of PCIQ

Simulator

Event-driven Simulator Works at the clock level. VCT and Flow Control are modeled. Each link of the switch is attached to an end node. Injection by nodes at maximum link rate (1 byte/cycle). Three phase arbiter is modeled. The switch forwards a byte from an input to an output in

  • ne cycle.
  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-43
SLIDE 43

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Evaluation of PCIQ

Results for Uniform Traffic

Throughput, 256 bytes

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Accepted traffic (Bytes/cycle/port) Injected traffic (Bytes/cycle/port) "HC" "PCIQ-4xbar" "PCIQ-2xbar" "CIOQ-2Q" "CIOQ-1Q"

Latency, 256 bytes

1000 2000 3000 4000 5000 6000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Network latency (Cycles) Injected traffic (Bytes/cycle/port) "HC" "PCIQ-4xbar" "PCIQ-2xbar" "CIOQ-2Q" "CIOQ-1Q"

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-44
SLIDE 44

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Evaluation of PCIQ

Results for Hot-spot plus Uniform Traffic

Throughput, 256 bytes

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Accepted traffic (Bytes/cycle/port) Injected traffic (Bytes/cycle/port) "HC" "PCIQ-4xbar" "PCIQ-2xbar" "CIOQ-2Q" "CIOQ-1Q"

Latency, 256 bytes

1000 2000 3000 4000 5000 6000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Network latency (Cycles) Injected traffic (Bytes/cycle/port) "HC" "PCIQ-4xbar" "PCIQ-2xbar" "CIOQ-2Q" "CIOQ-1Q"

Hot Spot: Each input sending 10% of traffic to a hot spot destination and the rest randomly.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-45
SLIDE 45

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Enhancements to PCIQ

Solutions to the HOL blocking problem VOQ

Queue requirements grow quadratically. Does not solve network HOL.

Increase Speedup

Increases the cost of the switch. Does not solve network HOL.

RECN (Regional Explicit Congestion Notification)

RECN is a congestion management technique that dynamically detects congestion and separates congested packets from non-congested ones. Requires a very limited set of extra queues (known as SAQs). Solves switch and network HOL blocking.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-46
SLIDE 46

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Enhancements to PCIQ

Solutions to the HOL blocking problem VOQ

Queue requirements grow quadratically. Does not solve network HOL.

Increase Speedup

Increases the cost of the switch. Does not solve network HOL.

RECN (Regional Explicit Congestion Notification)

RECN is a congestion management technique that dynamically detects congestion and separates congested packets from non-congested ones. Requires a very limited set of extra queues (known as SAQs). Solves switch and network HOL blocking.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-47
SLIDE 47

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Enhancements to PCIQ

Solutions to the HOL blocking problem VOQ

Queue requirements grow quadratically. Does not solve network HOL.

Increase Speedup

Increases the cost of the switch. Does not solve network HOL.

RECN (Regional Explicit Congestion Notification)

RECN is a congestion management technique that dynamically detects congestion and separates congested packets from non-congested ones. Requires a very limited set of extra queues (known as SAQs). Solves switch and network HOL blocking.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-48
SLIDE 48

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Evaluation of PCIQ with RECN

Results for Uniform Traffic

Throughput, 256 bytes

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Accepted traffic (Bytes/cycle/port) Injected traffic (Bytes/cycle/port) "PCIQ-2xbar-4saqs" "PCIQ-2xbar-2saqs" "HC" "PCIQ-4xbar"

Latency, 256 bytes

1000 2000 3000 4000 5000 6000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Network latency (Cycles) Injected traffic (Bytes/cycle/port) "PCIQ-2xbar-4saqs" "PCIQ-2xbar-2saqs" "HC" "PCIQ-4xbar"

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-49
SLIDE 49

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Evaluation of PCIQ with RECN

Results for Hot-spot plus Uniform Traffic

Throughput, 256 bytes

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Accepted traffic (Bytes/cycle/port) Injected traffic (Bytes/cycle/port) "PCIQ-2xbar-4saqs" "PCIQ-2xbar-2saqs" "HC" "PCIQ-4xbar"

Latency, 256 bytes

1000 2000 3000 4000 5000 6000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Network latency (Cycles) Injected traffic (Bytes/cycle/port) "PCIQ-2xbar-4saqs" "PCIQ-2xbar-2saqs" "HC" "PCIQ-4xbar"

Hot Spot: Each input sending 10% of traffic to a hot spot destination and the rest randomly.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-50
SLIDE 50

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Evaluation of PCIQ

Special Results

RECN added to CIOQ architecture.

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Accepted traffic (Bytes/cycle/port) Injected traffic (Bytes/cycle/port) "PCIQ-2xbar-2saqs" "CIOQ-2saqs"

Worst case traffic for PCIQ.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Accepted traffic (Bytes/cycle/port) Injected traffic (Bytes/cycle/port) "PCIQ-2xbar-4saqs" "PCIQ-2xbar-2saqs" "CIOQ-2Q" "HC" "PCIQ-2xbar" "CIOQ-1Q"

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-51
SLIDE 51

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Cost Analysis

Cost depends on... Memory Resources. Crossbar. Arbiter Complexity. Memory Resources

200 400 600 800 1000 50 100 150 200 250 Number of required memories Switch radix CIOQ minimum cost HC PCIQ-2xbar

Crossbar

CIOQ → N × N : N2 HC → (N/p)2 xbars p × p : N2 PCIQ → k xbars N × N/k : N2

Arbiter Complexity

Deduced from the crossbar complexity. Thus, similar for all architectures.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-52
SLIDE 52

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Cost Analysis

Cost depends on... Memory Resources. Crossbar. Arbiter Complexity. Memory Resources

200 400 600 800 1000 50 100 150 200 250 Number of required memories Switch radix CIOQ minimum cost HC PCIQ-2xbar

Crossbar

CIOQ → N × N : N2 HC → (N/p)2 xbars p × p : N2 PCIQ → k xbars N × N/k : N2

Arbiter Complexity

Deduced from the crossbar complexity. Thus, similar for all architectures.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-53
SLIDE 53

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Cost Analysis

Cost depends on... Memory Resources. Crossbar. Arbiter Complexity. Memory Resources

200 400 600 800 1000 50 100 150 200 250 Number of required memories Switch radix CIOQ minimum cost HC PCIQ-2xbar

Crossbar

CIOQ → N × N : N2 HC → (N/p)2 xbars p × p : N2 PCIQ → k xbars N × N/k : N2

Arbiter Complexity

Deduced from the crossbar complexity. Thus, similar for all architectures.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-54
SLIDE 54

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Cost Analysis

Cost depends on... Memory Resources. Crossbar. Arbiter Complexity. Memory Resources

200 400 600 800 1000 50 100 150 200 250 Number of required memories Switch radix CIOQ minimum cost HC PCIQ-2xbar

Crossbar

CIOQ → N × N : N2 HC → (N/p)2 xbars p × p : N2 PCIQ → k xbars N × N/k : N2

Arbiter Complexity

Deduced from the crossbar complexity. Thus, similar for all architectures.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-55
SLIDE 55

Motivation PCIQ Switch Architecture Summary Description Evaluation Enhancements and Cost Analysis

Cost Analysis

Overview

  • PCIQ

CIOQ HC

xbar xbar xbar xbar xbar xbar xbar

Arb. Arb. Arb. Arb. Arb. Arb. Arb. Arb. Arb. Arb. Arb. Arb. Arb. Arb. Arb. Arb.

Crossbar: Number Crosspoints = 16 Number of Arbiters = 4 Required Wires from Arbiter = 16 Crossbar: Number Crosspoints = 16 Number of Arbiters = 8 Required Wires from Arbiter = 16 Crossbar: Number Crosspoints = 16 Number of Arbiters = 4 Required Wires from Arbiter = 16

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-56
SLIDE 56

Motivation PCIQ Switch Architecture Summary

Outline

1

Motivation

2

PCIQ Switch Architecture Description Evaluation Enhancements and Cost Analysis

3

Summary

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-57
SLIDE 57

Motivation PCIQ Switch Architecture Summary

Summary

High-radix switches are becoming a necessity. Current switch organizations suffer low efficiency or high cost.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-58
SLIDE 58

Motivation PCIQ Switch Architecture Summary

Summary

High-radix switches are becoming a necessity. Current switch organizations suffer low efficiency or high cost.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-59
SLIDE 59

Motivation PCIQ Switch Architecture Summary

Summary

High-radix switches are becoming a necessity. Current switch organizations suffer low efficiency or high cost. PCIQ relies on... A partitioned crossbar that allows to increase the read bandwidth without increasing the cost. Two round-robin packed-based arbiters (one for each crossbar). A congestion management technique (RECN) to eliminate the HOL problem.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-60
SLIDE 60

Motivation PCIQ Switch Architecture Summary

Summary

High-radix switches are becoming a necessity. Current switch organizations suffer low efficiency or high cost. PCIQ relies on... A partitioned crossbar that allows to increase the read bandwidth without increasing the cost. Two round-robin packed-based arbiters (one for each crossbar). A congestion management technique (RECN) to eliminate the HOL problem.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-61
SLIDE 61

Motivation PCIQ Switch Architecture Summary

Summary

High-radix switches are becoming a necessity. Current switch organizations suffer low efficiency or high cost. PCIQ relies on... A partitioned crossbar that allows to increase the read bandwidth without increasing the cost. Two round-robin packed-based arbiters (one for each crossbar). A congestion management technique (RECN) to eliminate the HOL problem.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-62
SLIDE 62

Motivation PCIQ Switch Architecture Summary

Summary

High-radix switches are becoming a necessity. Current switch organizations suffer low efficiency or high cost. PCIQ relies on... A partitioned crossbar that allows to increase the read bandwidth without increasing the cost. Two round-robin packed-based arbiters (one for each crossbar). A congestion management technique (RECN) to eliminate the HOL problem.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-63
SLIDE 63

Motivation PCIQ Switch Architecture Summary

Summary

High-radix switches are becoming a necessity. Current switch organizations suffer low efficiency or high cost. PCIQ achieves... Cost similar or lower than basic organizations like CIOQ. Maximum switch efficiency for uniform traffic distribution. Eliminate completely switch and network-wide HOL blocking... Thus, maximum switch efficiency for non-uniform traffic.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-64
SLIDE 64

Motivation PCIQ Switch Architecture Summary

Summary

High-radix switches are becoming a necessity. Current switch organizations suffer low efficiency or high cost. PCIQ achieves... Cost similar or lower than basic organizations like CIOQ. Maximum switch efficiency for uniform traffic distribution. Eliminate completely switch and network-wide HOL blocking... Thus, maximum switch efficiency for non-uniform traffic.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-65
SLIDE 65

Motivation PCIQ Switch Architecture Summary

Summary

High-radix switches are becoming a necessity. Current switch organizations suffer low efficiency or high cost. PCIQ achieves... Cost similar or lower than basic organizations like CIOQ. Maximum switch efficiency for uniform traffic distribution. Eliminate completely switch and network-wide HOL blocking... Thus, maximum switch efficiency for non-uniform traffic.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-66
SLIDE 66

Motivation PCIQ Switch Architecture Summary

Summary

High-radix switches are becoming a necessity. Current switch organizations suffer low efficiency or high cost. PCIQ achieves... Cost similar or lower than basic organizations like CIOQ. Maximum switch efficiency for uniform traffic distribution. Eliminate completely switch and network-wide HOL blocking... Thus, maximum switch efficiency for non-uniform traffic.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-67
SLIDE 67

Motivation PCIQ Switch Architecture Summary

Summary

High-radix switches are becoming a necessity. Current switch organizations suffer low efficiency or high cost. PCIQ achieves... Cost similar or lower than basic organizations like CIOQ. Maximum switch efficiency for uniform traffic distribution. Eliminate completely switch and network-wide HOL blocking... Thus, maximum switch efficiency for non-uniform traffic.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-68
SLIDE 68

Thank you very much for your attention.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches

slide-69
SLIDE 69

For Further Reading

  • J. Duato, S. Yalamanchili, L. Ni.

Interconnection Networks: An Engineering Approach. Morgan Kaufmann, 2003.

  • J. Kim, W. J. Dally, B. Towles, A .K. Gupta.

Microarchitecture of a high-radix router. 32nd ISCA, 420–431, 2005.

  • E. S. Shin, V. J. Mooney III, G. F

. Riley. Round-robin arbiter design and generation. 15th International Symposium on System Synthesis, 2002. Hans Jürgen Mattausch. Hierarchical N-Port Memory Architecture based on 1-Port Memory Cells. ESSCIRC’97, pp. 348–351, 1997.

  • G. Mora, J. Flich, J. Duato, P

. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches