Re-Architecting DRAM Memory Systems with Silicon Photonics Scott - - PowerPoint PPT Presentation

re architecting dram memory systems with silicon photonics
SMART_READER_LITE
LIVE PREVIEW

Re-Architecting DRAM Memory Systems with Silicon Photonics Scott - - PowerPoint PPT Presentation

Re-Architecting DRAM Memory Systems with Silicon Photonics Scott Beamer 1 , Chen Sun 2 , Yong-jin Kwon 1 , Ajay Joshi 3 , Christopher Batten 4 , Vladimir Stojanovi 2 , Krste Asanovi 1 1: University of California, Berkeley, CA 2:


slide-1
SLIDE 1

Re-Architecting DRAM Memory Systems with Silicon Photonics

Scott Beamer1, Chen Sun2, Yong-jin Kwon1, Ajay Joshi3, Christopher Batten4, Vladimir Stojanović2, Krste Asanović1

1: University of California, Berkeley, CA 2: Massachusetts Institute of Technology, Cambridge, MA 3: Boston University, Boston, MA 4: Cornell University, Ithaca, NY

International Symposium on Computer Architecture (ISCA) June 21, 2010

slide-2
SLIDE 2

Electrical DRAM is Limited

Memory Controller

Core Core On-Chip Interconnect

Memory Controller

Compute Chip DRAM Chip

On-Chip Interconnect

Bank

Page

slide-3
SLIDE 3

Electrical DRAM is Limited

Memory Controller

Core Core On-Chip Interconnect

Memory Controller

Compute Chip DRAM Chip

On-Chip Interconnect

Bank

Page

Pin-bandwidth on the compute chip

slide-4
SLIDE 4

Electrical DRAM is Limited

Memory Controller

Core Core On-Chip Interconnect

Memory Controller

Compute Chip DRAM Chip

On-Chip Interconnect

Bank

Page

Pin-bandwidth on the compute chip I/O energy to move between chips

slide-5
SLIDE 5

Electrical DRAM is Limited

Memory Controller

Core Core On-Chip Interconnect

Memory Controller

Compute Chip DRAM Chip

On-Chip Interconnect

Bank

Page

Pin-bandwidth on the compute chip I/O energy to move between chips Cross-chip energy within DRAM chip

slide-6
SLIDE 6

Electrical DRAM is Limited

Memory Controller

Core Core On-Chip Interconnect

Memory Controller

Compute Chip DRAM Chip

On-Chip Interconnect

Bank

Page

Pin-bandwidth on the compute chip I/O energy to move between chips Activation energy within DRAM chip Cross-chip energy within DRAM chip

slide-7
SLIDE 7

Solution: Silicon Photonics

Memory Controller

Core Core On-Chip Interconnect

Memory Controller

Compute Chip DRAM Chip

On-Chip Interconnect

Bank

Page

slide-8
SLIDE 8

Solution: Silicon Photonics

Memory Controller

Core Core On-Chip Interconnect

Memory Controller

Compute Chip DRAM Chip

On-Chip Interconnect

Bank

Page

Great bandwidth density Great off-chip energy efficiency

slide-9
SLIDE 9

Solution: Silicon Photonics

Memory Controller

Core Core On-Chip Interconnect

Memory Controller

Compute Chip DRAM Chip

On-Chip Interconnect

Bank

Page

Great bandwidth density Great off-chip energy efficiency Costs little additional energy to use on-chip after off-chip

slide-10
SLIDE 10

Solution: Silicon Photonics

Memory Controller

Core Core On-Chip Interconnect

Memory Controller

Compute Chip DRAM Chip

On-Chip Interconnect

Bank

Page

Great bandwidth density Great off-chip energy efficiency Enables page size reduction Costs little additional energy to use on-chip after off-chip

slide-11
SLIDE 11

Outline

Technology Background

Electrical DRAM Technology Silicon-Photonic Technology

Re-Architecting DRAM Memory Systems

Chip-Level Bank-Level

Evaluation Scaling Capacity with Optical Power Guiding

slide-12
SLIDE 12

Current DRAM Structure

slide-13
SLIDE 13

Current DRAM Structure

wordline bitline

Cell

slide-14
SLIDE 14

Current DRAM Structure

IO Row Decoder wordline bitline

Cell Array Core

wordline bitline

Cell

slide-15
SLIDE 15

Current DRAM Structure

IO Row Decoder Column Decoder Helper FFs wordline bitline

Cell Array Core Array Block

IO Row Decoder wordline bitline

Cell Array Core

wordline bitline

Cell

slide-16
SLIDE 16

Current DRAM Structure

IO Row Decoder Column Decoder Helper FFs I/O Strip wordline bitline

Bank

Cell Array Core Array Block Chip

IO Row Decoder Column Decoder Helper FFs wordline bitline

Cell Array Core Array Block

IO Row Decoder wordline bitline

Cell Array Core

wordline bitline

Cell

slide-17
SLIDE 17

Current DRAM Structure

IO Row Decoder Column Decoder Helper FFs I/O Strip wordline bitline

Bank

Cell Array Core Array Block Chip

IO Row Decoder Column Decoder Helper FFs wordline bitline

Cell Array Core Array Block

IO Row Decoder wordline bitline

Cell Array Core

wordline bitline

Cell

IO Row Decoder Column Decoder Helper FFs I/O Strip Memory Controller wordline bitline

Bank

Rank

Cell Array Core Array Block Chip Channel

slide-18
SLIDE 18

Photonic Technology

Monolithically integrated silicon photonics being researched by MIT Center for Integrated Photonic Systems (CIPS)

Backend Dielectric Silicon Substrate Air Gap STI Poly-Si Waveguide

Holzwarth et al., CLEO 2008

slide-19
SLIDE 19

Photonic Link

Each wavelength can transmit at 10Gbps Dense Wave Division Multiplexing (DWDM)

64 wavelengths per direction in same media

Off-chip Laser Vertical Coupler Die 1 Die 2

Photo- detector Ring Filter Ring Modulator Fiber Waveguide

Rough Comparison Electrical Photonic Off-Chip I/O Energy (pJ/bit) 5 0.150 Off-Chip BW Density (Tbps/mm2) 1.5 50.000

slide-20
SLIDE 20

Photonic Summary

Power consumers for a photonic link:

Light Generation Encode/Decode (Electro-Optical Conversion) Thermal Tuning

Features we are leveraging:

Better off-chip energy efficiency (bits/J) Better off-chip bandwidth density (b/mm2) Seamless inter-chip links Can be built using mostly standard process

slide-21
SLIDE 21

Outline

Technology Background

Electrical DRAM Technology Silicon-Photonic Technology

Re-Architecting DRAM Memory Systems

Chip-Level Bank-Level

Evaluation Scaling Capacity with Optical Power Guiding

slide-22
SLIDE 22

Photonics to the Chip

Electrical Baseline (E1)

E E E E E E E E Bank Electrical Off-Chip Driver E

Photonics Off-Chip w/ Electrical On-Chip (P1)

P P P P P P P P P Bank Photonic Data Access Point

slide-23
SLIDE 23

Photonics Into the Chip

2 Data Access Points per Column (P2) 8 Data Access Points per Column (P8)

Photonic Data Access Point Bank Photonic Data Access Point Bank

slide-24
SLIDE 24

Reducing Activate Energy

Want to activate less bits while achieving the same access width Increase number of I/Os per array core, which decreases page size

Bank Bank DRAM Chip Array Block Activated Row Data Accessed

Initial Design Double the I/Os (and bandwidth)

slide-25
SLIDE 25

Outline

Technology Background

Electrical DRAM Technology Silicon-Photonic Technology

Re-Architecting DRAM Memory Systems

Chip-Level Bank-Level

Evaluation Scaling Capacity with Optical Power Guiding

slide-26
SLIDE 26

Methodology

Photonic Model - aggressive and conservative projections DRAM Model - Heavily modified CACTI-D Custom C++ architectural simulator running random traffic to animate models Setup is configurable, in this presentation:

1 chip to obtain 1GB capacity with >500Gbps of bandwidth provided by 64 banks

slide-27
SLIDE 27

Energy for On/Off-Chip

Floorplan

slide-28
SLIDE 28

Reducing Row Size

4 I/Os per Array Core 32 I/Os per Array Core

slide-29
SLIDE 29

Latency

Latency marginally better Most of latency is within array core Since array core mostly unchanged, latency only slightly improved by reduced serialization latency

slide-30
SLIDE 30

Area

4 I/Os per Array Core 32 I/Os per Array Core

slide-31
SLIDE 31

Outline

Technology Background

Electrical DRAM Technology Silicon-Photonic Technology

Re-Architecting DRAM Memory Systems

Chip-Level Bank-Level

Evaluation Scaling Capacity with Optical Power Guiding

slide-32
SLIDE 32

Scaling Capacity

Motivation: allow the system to increase capacity without increasing bandwidth

slide-33
SLIDE 33

Scaling Capacity

Motivation: allow the system to increase capacity without increasing bandwidth

DRAM Chip Compute Chip Laser

Shared Photonic Bus

Vantrease et al., ISCA 2008

Disadvantage: high path loss (grows exponentially) due to couplers and waveguide

slide-34
SLIDE 34

Split Photonic Bus

Advantage: much lower path loss Disadvantage: all paths lit

Laser DRAM Chip Compute Chip

slide-35
SLIDE 35

Guided Photonic Bus

Advantage: only 1 low loss path lit

Laser DRAM Chip Compute Chip

slide-36
SLIDE 36

Scaling Results

5 10 15 20 25 30 0.05 0.1 0.15 0.2 Laser Energy (pJ/bt) Number of PIDRAM Chips per Channel P1 Shared P8 Shared P16 Shared P1 Split P8 Split P16 Split P1 Guided P8 Guided P16 Guided

slide-37
SLIDE 37

With Photonics...

slide-38
SLIDE 38

With Photonics...

10x memory bandwidth for same power

slide-39
SLIDE 39

With Photonics...

10x memory bandwidth for same power Higher memory capacity without sacrificing bandwidth

slide-40
SLIDE 40

With Photonics...

10x memory bandwidth for same power Higher memory capacity without sacrificing bandwidth Area neutral

slide-41
SLIDE 41

With Photonics...

10x memory bandwidth for same power Higher memory capacity without sacrificing bandwidth Area neutral Easily adapted to other storage technologies

slide-42
SLIDE 42

With Photonics...

10x memory bandwidth for same power Higher memory capacity without sacrificing bandwidth Area neutral Easily adapted to other storage technologies

We would like to thank the MIT Center for Integrated Photonic Systems (CIPS) for researching the enabling technology, including: Jason Orcutt, Anatoly Khilo, Benjamin Moss, Charles Holzwarth, Miloš Popović, Hanqing Li, Henry Smith, Judy Hoyt, Franz Kartner, Rajeev Ram, Michael Georgas, Jonathan Leu, John Sun, Cheryl Sorace

slide-43
SLIDE 43

With Photonics...

10x memory bandwidth for same power Higher memory capacity without sacrificing bandwidth Area neutral Easily adapted to other storage technologies

We would like to thank the MIT Center for Integrated Photonic Systems (CIPS) for researching the enabling technology, including: Jason Orcutt, Anatoly Khilo, Benjamin Moss, Charles Holzwarth, Miloš Popović, Hanqing Li, Henry Smith, Judy Hoyt, Franz Kartner, Rajeev Ram, Michael Georgas, Jonathan Leu, John Sun, Cheryl Sorace This work was in part funded by:

slide-44
SLIDE 44

With Photonics...

10x memory bandwidth for same power Higher memory capacity without sacrificing bandwidth Area neutral Easily adapted to other storage technologies

We would like to thank the MIT Center for Integrated Photonic Systems (CIPS) for researching the enabling technology, including: Jason Orcutt, Anatoly Khilo, Benjamin Moss, Charles Holzwarth, Miloš Popović, Hanqing Li, Henry Smith, Judy Hoyt, Franz Kartner, Rajeev Ram, Michael Georgas, Jonathan Leu, John Sun, Cheryl Sorace This work was in part funded by: DARPA, Intel, Microsoft, UC Discovery

slide-45
SLIDE 45

Backup Slides

slide-46
SLIDE 46

Resonant Rings

light not resonant resonant light resonant light w/ drop path

figures inspired by [Vantrease, ISCA ’08]

slide-47
SLIDE 47

Ring Modulators

Modulator uses charge injection to change resonant wavelength When resonant light passes it mostly gets trapped in ring

resonant racetrack modulator modulator off

slide-48
SLIDE 48

Ring Modulators

Modulator uses charge injection to change resonant wavelength When resonant light passes it mostly gets trapped in ring

resonant racetrack modulator modulator on

slide-49
SLIDE 49

Photonic Components

slide-50
SLIDE 50

Why 5pJ/b for Electrical?

Prior work has claimed lower than our forecasted 5pJ/b for off-chip electrical I/O

2.24 pJ/b @ 6.25Gbps (Palmer et al., ISSCC 2007) 1.4 pJ/b @ 10Gbps (O’Mahony et al., ISSCC 2010)

Some important differences to consider:

We assume 20Gbps per pin Otherwise will definitely be pin limited At higher data rates it is hard to be as energy efficient: 8-13pJ/b @ 16Gbps (Lee et al., JSSC 2009)

DRAM process has slower transistors leading to less energy efficient drivers Background energy averaged in (clocking, fixed energy, not 100% utilization)

slide-51
SLIDE 51

Control Distribution

Control distributed from the center of the chip H-tree spreads out to banks Can power gate control lines to inactive banks

Electrical Baseline & Control H-Tree Photonic Floorplan showing Control Access Point

slide-52
SLIDE 52

Full Energy

2 4 6 8 10 Energy (pJ/bt) E 1 P 1 P 2 P 4 P 8 P 1 6 P 3 2 P 6 4 E 1 P 1 P 2 P 4 P 8 P 1 6 P 3 2 P 6 4 E 1 P 1 P 2 P 4 P 8 Laser Write Laser Read Thermal Tuning Fixed Circuits Write Read Activate 2 4 6 8 10 Energy (pJ/bt) E 1 P 1 P 2 P 4 P 8 P 1 6 P 3 2 P 6 4 E 1 P 1 P 2 P 4 P 8 P 1 6 P 3 2 P 6 4 E 1 P 1 P 2 P 4 P 8

Aggressive Conservative 64 Wavelengths, 4 I/Os 64 Wavelengths, 32 I/Os 8 Wavelengths, 32 I/Os

slide-53
SLIDE 53

Utilization

200 400 600 5 10 15 Energy (pJ/bt) Achieved Bandwidth (Gb/s) 200 400 600 Achieved Bandwidth (Gb/s) 20 40 60 80 Achieved Bandwidth (Gb/s) E1 P1 P4 P8 P16 200 400 600 5 10 15 Energy (pJ/bt) Achieved Bandwidth (Gb/s) 200 400 600 Achieved Bandwidth (Gb/s) 20 40 60 80 Achieved Bandwidth (Gb/s)

Aggressive Conservative 64 Wavelengths, 4 I/Os 64 Wavelengths, 32 I/Os 8 Wavelengths, 32 I/Os

slide-54
SLIDE 54

Full Area

20 40 60 80 100 120 140 160 180 Area (mm2) E 1 P 1 P 2 P 4 P 8 P 1 6 P 3 2 P 6 4 E 1 P 1 P 2 P 4 P 8 P 1 6 P 3 2 P 6 4 E 1 P 1 P 2 P 4 P 8 I/O Overhead Inter−Bank Overhead Intra−Bank Overhead Memory Cells

64 Wavelengths, 4 I/Os 64 Wavelengths, 32 I/Os 8 Wavelengths, 32 I/Os

slide-55
SLIDE 55

Full Scaling

Aggressive Conservative