Columbia University Chip-Scale Interconnection Networks Chip - - PowerPoint PPT Presentation

columbia university chip scale interconnection networks
SMART_READER_LITE
LIVE PREVIEW

Columbia University Chip-Scale Interconnection Networks Chip - - PowerPoint PPT Presentation

Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects Performance bottleneck of


slide-1
SLIDE 1

Hybrid On-chip Data Networks

Gilbert Hendry Keren Bergman Lightwave Research Lab

Columbia University

slide-2
SLIDE 2

2

Chip-Scale Interconnection Networks

Intel Polaris IBM Cell AMD Opteron

  • Chip multi-processors create need for high

performance interconnects

  • Performance bottleneck of on-chip networks and I/O
  • Power dissipation constraints of the chip package
  • > 50% of total power comes from interconnects*

* N. Magen et al., “Interconnect-power dissipation in a microprocessor,” SLIP 2004.

slide-3
SLIDE 3

3

Motivation

  • CMPs of the future = 3D stacking
  • Lots of data on chip
  • Photonics offers

key advantages

slide-4
SLIDE 4

4

Why Photonics?

TX RX ELECTRONICS:

  • Buffer, receive and re-transmit at

every router.

  • Each bus lane routed independently.

(P  NLANES)

  • Off-chip BW is pin-limited and

power hungry. Photonics changes the rules for Bandwidth, Energy, and Distance.

OPTICS:

  • Modulate/receive high bandwidth

data stream once per communication event.

  • Broadband switch routes entire multi-

wavelength stream.

  • Off-chip BW = On-chip BW for

nearly same power.

RX TX RX RX TX RX TX

RX TX

TX TX TX TX TX RX

slide-5
SLIDE 5

5

Hybrid Network Premise

Optical processing difficult and limited Source, destination routing inefficient Use electronics for routing,

  • ptics for switching and transmission

Hybrid Circuit-Switching

slide-6
SLIDE 6

6

Hybrid Circuit-Switched Networks

Step 1: Path SETUP request

Electronic SETUP Msg Source core Destination Core

slide-7
SLIDE 7

7

Hybrid Circuit-Switched Networks

Step 2: Path ACK

Electronic ACK Msg

slide-8
SLIDE 8

8

Hybrid Circuit-Switched Networks

Step 3: Transmit Data

Photonic Switch Use Information

slide-9
SLIDE 9

9

Hybrid Circuit-Switched Networks

Meanwhile: Path Contention

Path BLOCKED Msg (Backoff)

slide-10
SLIDE 10

10

Hybrid Circuit-Switched Networks

Step 4: Path TEARDOWN

Electronic SETUP Msg Source core Destination Core

slide-11
SLIDE 11

11

Hybrid Circuit-Switched Networks

  • Energy-efficient end-to-

end transmission

  • High bandwidth through

WDM

  • Electronic network still

available for small control messages*

  • Network-level support

for secure regions

  • Path setup latency
  • Path setup contention

(no fairness)

Pros: Cons:

* [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]

slide-12
SLIDE 12

Programming and Communication

slide-13
SLIDE 13

13

Shared Memory

Implicit Communication Explicit Communication scaling

“… [OpenMP on large systems] often performs worse than message passing due to a combination of false sharing, coherence traffic, contention, and system issues that arise from the difference in scheduling and network interface moderation” ~ Exascale Report

slide-14
SLIDE 14

14

Partitioned Global Address Space

Implicit Communication Explicit Communication

[G. Hendry et al. Circuit-Switched Memory Access in Photonic Interconnection Networks for HPEC. In Supercomputing, Nov. 2010]

Access Method Local Read Optical Receive Local Write Optical send Remote Read Electronic request, optical receive Remote Write Optical send Shared R/W ?

slide-15
SLIDE 15

15

Message Passing

Implicit Communication Explicit Communication

* [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]

  • Complex, dynamic access

patterns

  • Relatively larger blocks of data
  • Scientific computing 
slide-16
SLIDE 16

16

Streaming

Implicit Communication Explicit Communication 1 2 3 4 Input Data Output Data Persistent optical circuits

  • Embedded / specialized systems (Graphics, Image + Signal Proc.)
  • Execution mode of general-purpose systems (Cell Processor)
slide-17
SLIDE 17

Electronic Plane

slide-18
SLIDE 18

18

Electronic Router

Arbiter

Control Router Data Switch

Buffer Crossbar Buffer Cntrl Data Path Xbar Cntrl Request Bus

Flow Control Xbar Allocation Data Switch Allocation Routing Logic

Credits In Xbar Cntrl Ring Cntrl

Ring Cntrl

  • Low frequency operation (~ 1GHz)
  • 1 VC (typically)
  • Small buffers (64-28)
  • Narrow Channels (8-32)
slide-19
SLIDE 19

19

Network Gateway

Core Core Core Core

Tx/Rx

Network IF

Bidirectional Waveguide Bidirectional Electronic Channel Control Router Electronic Crossbar 5-port photonic switch

To/From

Control plane

To/From

Data plane

Serialization Drivers

Deserialization

Receivers

[P. Kumar et al. Exploring concentration and channel slicing in on-chip network router. In NOCS, 2009]

External Concentration

slide-20
SLIDE 20

The Photonic Plane

slide-21
SLIDE 21

21

Silicon Photonic Waveguide Technology

[Vlasov and McNab, Optics Express 12 (8) 1622 (2004)]

C23 (1559 nm) C28 (1555 nm) C46 (1541 nm) C51 (1537 nm)

before injection into waveguide after 5-cm waveguide and EDFA

[B. G. Lee et al., Photon. Technol. Lett. 20 (10) 767 (2008)]

1.28 Tb/s Data Transmission Experiment (occupies small slice of available WG BW) 100 ps

Silicon photonic waveguides provide low-power optical interconnects in CMOS-compatible platform.

Low-loss (1.7 dB/cm), high-bandwidth (> 200 nm) silicon photonic waveguides can be fabricated in commercial CMOS process.

slide-22
SLIDE 22

22

Silicon Photonic Modulator and Detector Technology

[M Watts, Group Four Photonics (2008)] [M Lipson, Optics Express (2007)]

  • 85 fJ/bit demonstrated at 10 Gb/s
  • Scalable to < 25 fJ/bit
  • 18 Gb/s demonstrated

[S Koester, J. Lightw. Technol. (2007)]

Ge-on-Si Detectors:

  • 40-GHz bandwidths
  • 1 A/W responsivities

Receivers (detectors w/ CMOS amplifiers):

  • 1.1 pJ/bit demonstrated at 10 Gb/s
  • Scalable to < 50 fJ/bit

(CW) LASER modulator detector

slide-23
SLIDE 23

23

Silicon Photonic Micro-Ring Switch Explanation

in0 in1

  • ut0
  • ut1

fast control of resonance wavelength via carrier injection  Transmission (ini  outi) bar state cross state no current,

  • n-resonance

current,

  • ff-resonance
slide-24
SLIDE 24

24

Higher Order Switch Designs

slide-25
SLIDE 25

25

On-Chip Topology Exploration

  • Photonic Torus
  • Nonblocking Photonic

Torus

[A. Shacham et al., Trans. on Comput., 2008] [M. Petracca et al. IEEE Micro, 2008]

slide-26
SLIDE 26

26

On-Chip Topology Exploration

  • TorusNX
  • Square Root

[J. Chan et al. JLT, May 2010]

slide-27
SLIDE 27

27

Photonic Plane Characteristics

  • Insertion Loss
  • Noise
  • Power
slide-28
SLIDE 28

28

Insertion Loss and Optical Power Budget

Nonlinear Effects

WDM Factor

Optical Power Budget

Worst-case Insertion Loss

Detector Sensitivity

slide-29
SLIDE 29

29

Insertion Loss vs. Bandwidth

Network Size Number of λ

Topologies

slide-30
SLIDE 30

30

Simulation Results

4 × 4 6×6 8×8 1 × 1 12×12 1 4 × 1 4 1 6 × 1 6 1 8 × 1 8 10 20 30 40 50 Insertion Loss (dB) Topology Size (nodes) Torus Topology 20.6 25.6 31.2 37.0 42.8 48.6 54.5 60.3 4 × 4 6 × 6 8×8 1 × 1 1 2 × 1 2 1 4 × 1 4 1 6 × 1 6 10 20 30 40 50 I n s e r t i

  • n

L

  • s

s ( d B ) Topology Size (nodes) Non-BlockingTorus Topology 18.7 25.3 31.5 38.0 44.1 50.6 56.8 1 8 × 1 8 63.2 4×4 6×6 8×8 10×10 12×12 14×14 16×16 18×18 10 20 30 40 50 I n s e r t i

  • n

L

  • s

s ( d B ) Topology Size (nodes) TorusNX Topology 15.8 19.5 23.2 27.1 31.0 34.9 38.8 42.7 4×4 8×8 16×16 10 20 30 40 50 I n s e r t i

  • n

L

  • s

s ( d B ) Square Root Topology 12.2 21.5 30.6 Propagation Crossing Dropping Into a Ring 1 × 1 1 2 × 1 2 1 4 × 1 4 1 6 × 1 6 1 8 × 1 8 Topology Size (nodes)

slide-31
SLIDE 31

31

Simulation Results

100 200 300 1 10 100 N u m b e r

  • f

W a v e l e n g t h C h a n n e l s Number of Access Points Torus Topology 100 Non-Blocking Torus Topology 10 20 30 1 10 N u m b e r

  • f

W a v e l e n g t h C h a n n e l s Number of Access Points TorusNX Topology 100 200 300 1 10 100 Number of Wavelength Channels Number of Access Points Square Root Topology 100 200 300 1 10 100 Number of Wavelength Channels Number of Access Points

Original is based on the IL results from previous slide, Improved is based on a hypothetical improvement in crossing loss from 0.15 dB to 0.05 dB.

Optical power budget Optical power budget

slide-32
SLIDE 32

32

Photonic Plane Characteristics

  • Insertion Loss
  • Noise
  • Power
slide-33
SLIDE 33

33

Noise and Crosstalk

Laser Noise Inter-Message Crosstalk Intra-Message Crosstalk Modulation Noise Crosstalk Filter

Coherent noise Incoherent noise

slide-34
SLIDE 34

34

Effects of Noise

Network Size Optical SNR Number of λ Network Load

slide-35
SLIDE 35

35

Simulation Results

10 20 30 40 50 Optica l S N R ( d B) 100 101 102 103 104 105 106 107 Message Size (bit) Torus Non-blocking Torus TorusNX Square Root

The line at OSNR=16.9 dB is where a bit-error-rate of 10-12 can be achieved, assuming an ideal binary receiver circuit and orthogonal signaling.

Results

  • Results are plotted for network size of 8×8

at saturation, at the detectors.

  • Maximum OSNR = ~45 dB (due to laser

noise)

  • Minimum OSNR < 17 dB (due to

message-to-message crosstalk)

  • Variations between networks due to

varying likelihood of two message intersecting on network topology. System Performance

  • SNR measures the likelihood of error-free

transmission.

  • Lower SNR designs will require additional

retransmission, resulting in lower throughput performance.

slide-36
SLIDE 36

36

Photonic Plane Characteristics

  • Insertion Loss
  • Noise
  • Power
slide-37
SLIDE 37

37

Power Usage

0V 1V

n-region p-region Electronic Control

0V 1V

Ohmic Heater Thermal Control

 Transmission

Injected Wavelengths Off-resonance profile On-resonance profile

  • Laser Power
  • Active Power
  • Modulating
  • Detecting
  • Broadband
  • Static Power
  • Thermal tuning
  • Tx\Rx Power
  • Drivers
  • TIAs
slide-38
SLIDE 38

38

Energy Per Bit

10-13 10-12 10-11 10-10 10-9 10-8 Energy per Bit (J/bit) 10-7 100 101 102 103 104 105 106 107 Message Size (bit) Torus Non-blocking Torus TorusNX Square Root

slide-39
SLIDE 39

39

Power Breakdown

Router Logic 43% Router Buffer 44% Electronic Wire 3% Detector 3% Modulator 4% PSE 2% Thermal 1% Router Logic 45% Router Buffer 44% Electronic Wire 2% Detector 2% Modulator 4% PSE 2% Thermal 1%

  • Results based on randomly generated traffic with message sizes of 100 kbit, with network in saturation.
  • Data was collected on 64 nodes topologies constrained to a total surface area of 2 cm × 2 cm.

Torus Topology Nonblocking Torus Topology

  • 7 wavelengths @ 10 Gbps/each
  • Power Dissipation = 1.59 W
  • 12 wavelengths @ 10 Gbps/each
  • Power Dissipation = 4.31 W
slide-40
SLIDE 40

40

Power Breakdown

Router Logic 37% Router Buffer 31% Electronic Wire 1% Detector 10% Modulator 17% PSE 1% Thermal 3% Router Logic 34% Router Buffer 31% Electronic Wire 7% Detector 8% Modulator 14% PSE 2% Thermal 4% Square Root Topology TorusNX Topology

  • 38 wavelengths @ 10 Gbps/each
  • Power Dissipation = 3.22 W
  • 27 wavelengths @ 10 Gbps/each
  • Power Dissipation = 1.89 W
slide-41
SLIDE 41

41

Performance

slide-42
SLIDE 42

Other Interesting Issues

slide-43
SLIDE 43

43

Memory Access

Processor Core Network Router Memory Access Point

[G. Hendry et al. Circuit-Switched Memory Access in Photonic Interconnection Networks for HPEC. In Supercomputing, Nov. 2010]

slide-44
SLIDE 44

44

Other Arbitration Means - TDM

[G. Hendry et al. Silicon Nanophotonic Network-On-Chip Using TDM Arbitration. In HOTI, Aug. 2010]

slide-45
SLIDE 45

45

Wavelength Granularity

  • Original

 Re-design

λ λ

 Scalable number of WDM

channels

slide-46
SLIDE 46

46

Conclusion

  • Some applications / programming models definitely

well-suited to a circuit-switched photonic network

  • Interesting tradeoffs and design space