Columbia University Chip-Scale Interconnection Networks Chip - - PowerPoint PPT Presentation
Columbia University Chip-Scale Interconnection Networks Chip - - PowerPoint PPT Presentation
Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects Performance bottleneck of
2
Chip-Scale Interconnection Networks
Intel Polaris IBM Cell AMD Opteron
- Chip multi-processors create need for high
performance interconnects
- Performance bottleneck of on-chip networks and I/O
- Power dissipation constraints of the chip package
- > 50% of total power comes from interconnects*
* N. Magen et al., “Interconnect-power dissipation in a microprocessor,” SLIP 2004.
3
Motivation
- CMPs of the future = 3D stacking
- Lots of data on chip
- Photonics offers
key advantages
4
Why Photonics?
TX RX ELECTRONICS:
- Buffer, receive and re-transmit at
every router.
- Each bus lane routed independently.
(P NLANES)
- Off-chip BW is pin-limited and
power hungry. Photonics changes the rules for Bandwidth, Energy, and Distance.
OPTICS:
- Modulate/receive high bandwidth
data stream once per communication event.
- Broadband switch routes entire multi-
wavelength stream.
- Off-chip BW = On-chip BW for
nearly same power.
RX TX RX RX TX RX TX
RX TX
TX TX TX TX TX RX
5
Hybrid Network Premise
Optical processing difficult and limited Source, destination routing inefficient Use electronics for routing,
- ptics for switching and transmission
Hybrid Circuit-Switching
6
Hybrid Circuit-Switched Networks
Step 1: Path SETUP request
Electronic SETUP Msg Source core Destination Core
7
Hybrid Circuit-Switched Networks
Step 2: Path ACK
Electronic ACK Msg
8
Hybrid Circuit-Switched Networks
Step 3: Transmit Data
Photonic Switch Use Information
9
Hybrid Circuit-Switched Networks
Meanwhile: Path Contention
Path BLOCKED Msg (Backoff)
10
Hybrid Circuit-Switched Networks
Step 4: Path TEARDOWN
Electronic SETUP Msg Source core Destination Core
11
Hybrid Circuit-Switched Networks
- Energy-efficient end-to-
end transmission
- High bandwidth through
WDM
- Electronic network still
available for small control messages*
- Network-level support
for secure regions
- Path setup latency
- Path setup contention
(no fairness)
Pros: Cons:
* [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]
Programming and Communication
13
Shared Memory
Implicit Communication Explicit Communication scaling
“… [OpenMP on large systems] often performs worse than message passing due to a combination of false sharing, coherence traffic, contention, and system issues that arise from the difference in scheduling and network interface moderation” ~ Exascale Report
14
Partitioned Global Address Space
Implicit Communication Explicit Communication
[G. Hendry et al. Circuit-Switched Memory Access in Photonic Interconnection Networks for HPEC. In Supercomputing, Nov. 2010]
Access Method Local Read Optical Receive Local Write Optical send Remote Read Electronic request, optical receive Remote Write Optical send Shared R/W ?
15
Message Passing
Implicit Communication Explicit Communication
* [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]
- Complex, dynamic access
patterns
- Relatively larger blocks of data
- Scientific computing
16
Streaming
Implicit Communication Explicit Communication 1 2 3 4 Input Data Output Data Persistent optical circuits
- Embedded / specialized systems (Graphics, Image + Signal Proc.)
- Execution mode of general-purpose systems (Cell Processor)
Electronic Plane
18
Electronic Router
Arbiter
…
Control Router Data Switch
Buffer Crossbar Buffer Cntrl Data Path Xbar Cntrl Request Bus
Flow Control Xbar Allocation Data Switch Allocation Routing Logic
Credits In Xbar Cntrl Ring Cntrl
Ring Cntrl
- Low frequency operation (~ 1GHz)
- 1 VC (typically)
- Small buffers (64-28)
- Narrow Channels (8-32)
19
Network Gateway
Core Core Core Core
Tx/Rx
Network IF
Bidirectional Waveguide Bidirectional Electronic Channel Control Router Electronic Crossbar 5-port photonic switch
To/From
Control plane
To/From
Data plane
Serialization Drivers
Deserialization
Receivers
[P. Kumar et al. Exploring concentration and channel slicing in on-chip network router. In NOCS, 2009]
External Concentration
The Photonic Plane
21
Silicon Photonic Waveguide Technology
[Vlasov and McNab, Optics Express 12 (8) 1622 (2004)]
C23 (1559 nm) C28 (1555 nm) C46 (1541 nm) C51 (1537 nm)
before injection into waveguide after 5-cm waveguide and EDFA
[B. G. Lee et al., Photon. Technol. Lett. 20 (10) 767 (2008)]
1.28 Tb/s Data Transmission Experiment (occupies small slice of available WG BW) 100 ps
Silicon photonic waveguides provide low-power optical interconnects in CMOS-compatible platform.
Low-loss (1.7 dB/cm), high-bandwidth (> 200 nm) silicon photonic waveguides can be fabricated in commercial CMOS process.
22
Silicon Photonic Modulator and Detector Technology
[M Watts, Group Four Photonics (2008)] [M Lipson, Optics Express (2007)]
- 85 fJ/bit demonstrated at 10 Gb/s
- Scalable to < 25 fJ/bit
- 18 Gb/s demonstrated
[S Koester, J. Lightw. Technol. (2007)]
Ge-on-Si Detectors:
- 40-GHz bandwidths
- 1 A/W responsivities
Receivers (detectors w/ CMOS amplifiers):
- 1.1 pJ/bit demonstrated at 10 Gb/s
- Scalable to < 50 fJ/bit
(CW) LASER modulator detector
23
Silicon Photonic Micro-Ring Switch Explanation
in0 in1
- ut0
- ut1
fast control of resonance wavelength via carrier injection Transmission (ini outi) bar state cross state no current,
- n-resonance
current,
- ff-resonance
24
Higher Order Switch Designs
25
On-Chip Topology Exploration
- Photonic Torus
- Nonblocking Photonic
Torus
[A. Shacham et al., Trans. on Comput., 2008] [M. Petracca et al. IEEE Micro, 2008]
26
On-Chip Topology Exploration
- TorusNX
- Square Root
[J. Chan et al. JLT, May 2010]
27
Photonic Plane Characteristics
- Insertion Loss
- Noise
- Power
28
Insertion Loss and Optical Power Budget
Nonlinear Effects
WDM Factor
Optical Power Budget
Worst-case Insertion Loss
Detector Sensitivity
29
Insertion Loss vs. Bandwidth
Network Size Number of λ
Topologies
30
Simulation Results
4 × 4 6×6 8×8 1 × 1 12×12 1 4 × 1 4 1 6 × 1 6 1 8 × 1 8 10 20 30 40 50 Insertion Loss (dB) Topology Size (nodes) Torus Topology 20.6 25.6 31.2 37.0 42.8 48.6 54.5 60.3 4 × 4 6 × 6 8×8 1 × 1 1 2 × 1 2 1 4 × 1 4 1 6 × 1 6 10 20 30 40 50 I n s e r t i
- n
L
- s
s ( d B ) Topology Size (nodes) Non-BlockingTorus Topology 18.7 25.3 31.5 38.0 44.1 50.6 56.8 1 8 × 1 8 63.2 4×4 6×6 8×8 10×10 12×12 14×14 16×16 18×18 10 20 30 40 50 I n s e r t i
- n
L
- s
s ( d B ) Topology Size (nodes) TorusNX Topology 15.8 19.5 23.2 27.1 31.0 34.9 38.8 42.7 4×4 8×8 16×16 10 20 30 40 50 I n s e r t i
- n
L
- s
s ( d B ) Square Root Topology 12.2 21.5 30.6 Propagation Crossing Dropping Into a Ring 1 × 1 1 2 × 1 2 1 4 × 1 4 1 6 × 1 6 1 8 × 1 8 Topology Size (nodes)
31
Simulation Results
100 200 300 1 10 100 N u m b e r
- f
W a v e l e n g t h C h a n n e l s Number of Access Points Torus Topology 100 Non-Blocking Torus Topology 10 20 30 1 10 N u m b e r
- f
W a v e l e n g t h C h a n n e l s Number of Access Points TorusNX Topology 100 200 300 1 10 100 Number of Wavelength Channels Number of Access Points Square Root Topology 100 200 300 1 10 100 Number of Wavelength Channels Number of Access Points
Original is based on the IL results from previous slide, Improved is based on a hypothetical improvement in crossing loss from 0.15 dB to 0.05 dB.
Optical power budget Optical power budget
32
Photonic Plane Characteristics
- Insertion Loss
- Noise
- Power
33
Noise and Crosstalk
Laser Noise Inter-Message Crosstalk Intra-Message Crosstalk Modulation Noise Crosstalk Filter
Coherent noise Incoherent noise
34
Effects of Noise
Network Size Optical SNR Number of λ Network Load
35
Simulation Results
10 20 30 40 50 Optica l S N R ( d B) 100 101 102 103 104 105 106 107 Message Size (bit) Torus Non-blocking Torus TorusNX Square Root
The line at OSNR=16.9 dB is where a bit-error-rate of 10-12 can be achieved, assuming an ideal binary receiver circuit and orthogonal signaling.
Results
- Results are plotted for network size of 8×8
at saturation, at the detectors.
- Maximum OSNR = ~45 dB (due to laser
noise)
- Minimum OSNR < 17 dB (due to
message-to-message crosstalk)
- Variations between networks due to
varying likelihood of two message intersecting on network topology. System Performance
- SNR measures the likelihood of error-free
transmission.
- Lower SNR designs will require additional
retransmission, resulting in lower throughput performance.
36
Photonic Plane Characteristics
- Insertion Loss
- Noise
- Power
37
Power Usage
0V 1V
n-region p-region Electronic Control
0V 1V
Ohmic Heater Thermal Control
Transmission
Injected Wavelengths Off-resonance profile On-resonance profile
- Laser Power
- Active Power
- Modulating
- Detecting
- Broadband
- Static Power
- Thermal tuning
- Tx\Rx Power
- Drivers
- TIAs
38
Energy Per Bit
10-13 10-12 10-11 10-10 10-9 10-8 Energy per Bit (J/bit) 10-7 100 101 102 103 104 105 106 107 Message Size (bit) Torus Non-blocking Torus TorusNX Square Root
39
Power Breakdown
Router Logic 43% Router Buffer 44% Electronic Wire 3% Detector 3% Modulator 4% PSE 2% Thermal 1% Router Logic 45% Router Buffer 44% Electronic Wire 2% Detector 2% Modulator 4% PSE 2% Thermal 1%
- Results based on randomly generated traffic with message sizes of 100 kbit, with network in saturation.
- Data was collected on 64 nodes topologies constrained to a total surface area of 2 cm × 2 cm.
Torus Topology Nonblocking Torus Topology
- 7 wavelengths @ 10 Gbps/each
- Power Dissipation = 1.59 W
- 12 wavelengths @ 10 Gbps/each
- Power Dissipation = 4.31 W
40
Power Breakdown
Router Logic 37% Router Buffer 31% Electronic Wire 1% Detector 10% Modulator 17% PSE 1% Thermal 3% Router Logic 34% Router Buffer 31% Electronic Wire 7% Detector 8% Modulator 14% PSE 2% Thermal 4% Square Root Topology TorusNX Topology
- 38 wavelengths @ 10 Gbps/each
- Power Dissipation = 3.22 W
- 27 wavelengths @ 10 Gbps/each
- Power Dissipation = 1.89 W
41
Performance
Other Interesting Issues
43
Memory Access
Processor Core Network Router Memory Access Point
[G. Hendry et al. Circuit-Switched Memory Access in Photonic Interconnection Networks for HPEC. In Supercomputing, Nov. 2010]
44
Other Arbitration Means - TDM
[G. Hendry et al. Silicon Nanophotonic Network-On-Chip Using TDM Arbitration. In HOTI, Aug. 2010]
45
Wavelength Granularity
- Original
Re-design
λ λ
Scalable number of WDM
channels
46
Conclusion
- Some applications / programming models definitely
well-suited to a circuit-switched photonic network
- Interesting tradeoffs and design space