Saving energy and increasing density in information processing using - - PowerPoint PPT Presentation

saving energy and increasing density in information
SMART_READER_LITE
LIVE PREVIEW

Saving energy and increasing density in information processing using - - PowerPoint PPT Presentation

Saving energy and increasing density in information processing using photonics David Miller, Stanford University For an electronic copy of these slides, please e- mail dabm@ee.stanford.edu See also D. A. B. Miller, Attojoule Optoelectronics


slide-1
SLIDE 1

Saving energy and increasing density in information processing using photonics

David Miller, Stanford University

For an electronic copy of these slides, please e- mail dabm@ee.stanford.edu See also D. A. B. Miller, “Attojoule Optoelectronics for Low-Energy Information Processing and Communications: a Tutorial Review,” IEEE/OSA J. Lightwave Technology 35 (3), 343-393 (2017) DOI: 10.1109/JLT.2017.2647779

slide-2
SLIDE 2

2

Summary

Growth in the use of information

Limits

 interconnect density  energy – which is mostly from interconnects

Using optics to eliminate the energies of wires

Using optics to solve the interconnect density problem off chips

Using optics to eliminate unnecessary circuits (and their power)

Goal – interconnects from 1cm to 10 m with ~10 – 100 fJ/bit instead of 1 – 10 pJ/bit Optics is the only physical way of

scaling interconnect density off the chip

eliminating this interconnect energy

Stanford Computer Systems Colloquium, April 3, 2019 David Miller

slide-3
SLIDE 3

3

1.E+06 1.E+07 1.E+08 1.E+09 1.E+10 1.E+11 1.E+12 1.E+13 1986 1993 2000 2007

Both Internet traffic General purpose computing hardware have grown ~ 60 % per year ~ X 100 in 10 years Massive challenge for hardware scaling of Energy Energy per bit has to reduce Energy scaling not environmentally sustainable ~ 4.6 – 9% of electricity in 2012 (Van Heddeghem et al., Computer Comm. 50 64–76 (2014)) Communication density inside systems already at limits for electrical approaches

Growth in information communication and processing

  • M. Hilbert and P

. Lopez, “The World’s Technological Capacity to Store, Communicate, and Compute Information,” Science 332, 60 (2011)

Telecommunications (B/s) Telecommunications (B/s) Voice Phone (B/s) Voice Phone (B/s) Internet (B/s) Internet (B/s) Gen. Purpose Computing (MIPS) Gen. Purpose Computing (MIPS)

MIPS – million instructions per second ~ 3 - 6 instructions = 1 floating point operation (FLOP)

Stanford Computer Systems Colloquium, April 3, 2019 David Miller

slide-4
SLIDE 4

4

The first transatlantic cable (1865)

David Miller Stanford Computer Systems Colloquium, April 3, 2019 William Thomson (Lord Kelvin) (1824 – 1907)

slide-5
SLIDE 5

5

What’s wrong with wires?

Signal gets weaker with distance Information gets mixed up

  • ver long distances

“Intersymbol interference” Wiring density Wires have to be thick for long distance communications

David Miller Stanford Computer Systems Colloquium, April 3, 2019 1902 Transpacific cable from Bamfield, Vancouver to Fanning

  • Island. 4000 miles long, ~ 100 characters/minute (~ 7 bits

per second) with a skilled operator

Original signal tape from Bamfield cable

slide-6
SLIDE 6

6

Density problem in electrical interconnects

Get universal form of scaling for simple digital connections no repeaters, no multilevel modem techniques

 A this wire carries the same number of bits per second as this wire bit rate B  A /  2

Once the wiring fills all space, the capacity cannot be increased either by making the system smaller

  • r making it larger

Optics completely avoids this scaling limitation no resistive loss small wavelength

  • J. Parallel and Dist. Comp.

41, 4252 (1997) David Miller Stanford Computer Systems Colloquium, April 3, 2019

slide-7
SLIDE 7

7

Wiring density

David Miller Stanford Computer Systems Colloquium, April 3, 2019

Chip wiring layers e.g., ~ 5 microns thick Transistors ~ 10 nm dimensions (not to scale) Chip vertical cross-section

slide-8
SLIDE 8

8

ITRS Projected Chip Performance – Bytes/FLOP

Compute power in floating point

  • perations per second (FLOPs)

Scaled from 2007 chip Input/Output rate from ITRS (International Technology Roadmap for Semiconductors (scaling number) (# Signal pins) x (off-chip clock rate)

1.00

10.00 100.00

1.00

10.00 100.00

2005 2010 2015 2020 2025

Year FLOPs I/O rate Byte/FLOP gap Compute power (TFLOPs) Input/ Output Rate (TByte/s)

Input/Output interconnect (I/O) rate does not keep up with ability of chip to calculate Ideal of 1 Byte of memory access for each floating point operation (FLOP) cannot be retained

  • Byte/FLOP gap

DM “Device Requirements for Optical Interconnects to Silicon Chips, ” Proc. IEEE 97, 1166 - 1185 (2009)

David Miller Stanford Computer Systems Colloquium, April 3, 2019

slide-9
SLIDE 9

9

Energies for communications and computations

Operation Energy per bit Wireless data 10 – 30J Internet: access 40 – 80nJ Internet: routing 20nJ Internet: optical WDM links 3nJ Reading DRAM 5pJ Communicating off chip 1 – 20 pJ Data link multiplexing and timing circuits ~ 2 pJ Communicating across chip 600 fJ Floating point operation 100fJ Energy in DRAM cell 10fJ Switching CMOS gate ~50aJ – 3fJ 1 electron at 1V, or 1 photon @1eV 0.16aJ (160zJ)

Stanford Computer Systems Colloquium, April 3, 2019

most energy is used for communications, not logic

David Miller

slide-10
SLIDE 10

10 10

Data rates at different length scales

Total long distance internet traffic ~ 280 Tb/s (Cisco) Equivalent to everyone talking on the phone at once all the time Traffic on “rack to rack” network inside one large data center ~ 1 Pb/s (Google) Graphics processor and server chips peak bandwidth on and off chip ~ 1.4 Tb/s – 2 Tb/s Server processor chip on-chip bandwidths

  • n-chip network bandwidth ~ 4 Tb/s

bandwidth in and out of L3 cache ~ 12.8 Tb/s

Stanford Computer Systems Colloquium, April 3, 2019

DM, JLT 35, 343 (2017)

David Miller

slide-11
SLIDE 11

11

Interconnect power

  • D. A. B. Miller, Proc. IEEE 97,

1166 - 1185 (2009)

Interconnect power limits chip performance ~ 50% of microprocessor power was interconnects in 2002, and has likely risen since System power is financially significant The cost of powering a server is comparable to the purchase cost of the server hardware Energy for one Google search? ~ 1 kJ Server interconnect power is already larger than solar power generation

Stanford Computer Systems Colloquium, April 3, 2019 David Miller

slide-12
SLIDE 12

12

Energy and information

Though it does take more energy to send a bit over longer distances there is massively more information sent at shorter distances so much so that most energy dissipation is in shorter links and in interconnects inside machines

Stanford Computer Systems Colloquium, April 3, 2019 David Miller

slide-13
SLIDE 13

13 13

Power dissipation in electrical wires

Wires always have large capacitance per unit length ~ 2 pF/cm, 200 aF/micron Simple logic-level signaling results in large dissipation Dissipate at least ~ ¼CV2 per bit in on-off signaling E.g., at 2pF/cm and a 2 cm chip, at 1 V on-off signaling energy per bit communicated at least ~ 1pJ

electrical connection small, high-impedance devices low impedance and/or high capacitance / unit length

Stanford Computer Systems Colloquium, April 3, 2019 David Miller

slide-14
SLIDE 14

14 14

Logic and wiring capacitance

Wiring capacitance even to neighboring gates is comparable to or greater than the transistor capacitance Most energy in information processing is in communications not in logic even at the gate level Most energy dissipation in information processing is in charging and discharging wire capacitance which is ~ 200 aF/micron Just “touching” a bit typically costs many fJ in CMOS

Stanford Computer Systems Colloquium, April 3, 2019

Logic gate Wire

David Miller

slide-15
SLIDE 15

15

Energy and information

The dominant energy dissipation at short distances inside machines is charging and discharging wire capacitance

Stanford Computer Systems Colloquium, April 3, 2019 David Miller

slide-16
SLIDE 16

16 16

Need to move to optics to save energy

 To save energy in the physical process of communications

stop wasting energy in charging and discharging electrical lines a fundamental quantum-mechanical advantage of

  • ptics

quantum impedance conversion charge the photodetector, not the wire (Also can continue to increase interconnect density using

  • ptics

solving the “byte per FLOP” problem in computer architectures)

Stanford Computer Systems Colloquium, April 3, 2019 David Miller

slide-17
SLIDE 17

17

Quantum impedance conversion

The photoelectric effect means it is possible to generate a “large” voltage in a detector (e.g., a fraction of a volt), with very little signal power or energy and very little classical voltage in the light beam (< 1mV for 1nW) “quantum impedance conversion” Optics only has to charge the photodetector and transistor to the logic voltage not the interconnect line

1 nW with 1 eV photons 1 G ~ 1 nA ~ 1 V

Stanford Computer Systems Colloquium, April 3, 2019

DM, Optics Letters, 14, 146 (1989)

David Miller

slide-18
SLIDE 18

18 18

How to do this?

Reduce energy in optoelectronic devices so the energy to send information optically becomes less than that of wires even for short distances e.g., centimeters or even shorter Low energy optoelectronic devices Pushing operating energies into the sub 10fJ or even attojoule range for output devices Modulators, LEDs, lasers including advanced nanophotonic structures Integrating sub-fF photodetectors right beside transistors

Stanford Computer Systems Colloquium, April 3, 2019

DM, JLT 35, 343 (2017)

David Miller

slide-19
SLIDE 19

19 19

Capacitance of small structures for fJ operation

Stanford Computer Systems Colloquium, April 3, 2019

So that capacitive charging energies do not dominate, we need

  • small devices for low device capacitance
  • very close integration to limit wiring capacitance

Structure Capacitance 100×100m square conventional photodetector ~1pF 5×5m CMOS photodetector 4fF Wire capacitance, per m ~200aF FinFET input capacitance ~ 20 – 200 aF 1 micron cube of semiconductor ~100aF 100 nm cube of semiconductor ~10aF 10 nm cube of semiconductor ~1aF

David Miller

DM, JLT 35, 343 (2017)

slide-20
SLIDE 20

20 20

First Ge quantum well waveguide-integrated modulator

10 microns long, 0.8 microns wide, 500 nm thick intrinsic region On silicon No resonator Selective area growth of quantum wells in SOI waveguides Capacitance ~ 3 fF 3 dB modulation with 4 V bias, 1 V swing, 1460 nm Dynamic energy per bit ~ 0.75 fJ Tested to 7Gb/s (equipment limited)

Si Waveguide Ge QW Modulator Contact Via 25μm High Speed Probe Pads

  • S. Ren et al., IEEE PTL 24, 461 – 463 (2012)
  • D. A. B. Miller, Optics Express 20, A293-A308 (2012)

Waveguides provided by Kotura

Stanford Computer Systems Colloquium, April 3, 2019

Harris and Miller groups, Stanford

David Miller

slide-21
SLIDE 21

21 21

Mask and layout for nanoantenna

David Miller Stanford Computer Systems Colloquium, April 3, 2019

slide-22
SLIDE 22

22 22

Need to move to optics to save energy

 New additional conclusion - Stop wasting energy in the

electrical circuits used to run interconnects

 low energies in optoelectronic devices themselves

cannot be exploited effectively if the dissipation in the associated circuits is large

 e.g., receiver amplifier circuits dissipating 100’s fJ/bit

to pJ’s/bit

 e.g., time-multiplexing circuitry dissipating pJ’s/bit

 clock and data recovery (CDR)  serialization/deserialization (SERDES)  clock distribution Stanford Computer Systems Colloquium, April 3, 2019

DM, JLT 35, 343 (2017)

David Miller

slide-23
SLIDE 23

23 23

Eliminating receiver energy

Integrate low capacitance photodetectors beside transistor input may eliminate need for voltage amplification altogether receiverless operation

  • r limit it to ~ one simple low energy gain stage

“near-receiverless” operation E.g., 1 fJ received optical energy

  • in 1 pF (conventional detector), generates ~1 mV
  • in 30 fF (solder-bumped photodetector), generates ~33

mV

  • in 1 fF (integrated detector), generates ~1 V

DM, JLT 35, 343 (2017)

David Miller Stanford Computer Systems Colloquium, April 3, 2019

slide-24
SLIDE 24

24 24

Eliminating receiver energy

Integration of optoelectronics right beside transistors e.g., within < a micron or a few microns at most allows excess capacitance in the scale of only 100’s of aF Photodetector elements on scales

  • f 1 micron or less dimensions

allow detector capacitance of ~100 aF Transistors themselves have input capacitances ~ 10’s to 100’s of aF Hence <1fF total capacitance is possible with integration

Stanford Computer Systems Colloquium, April 3, 2019

Photodetectors A B Channel Gate Insulator A B Source Drain

DM, JLT 35, 343 (2017)

David Miller

slide-25
SLIDE 25

25 25

Large synchronous systems?

 Time delays are not predictable in electronics because of

 pulse dispersion  the temperature coefficient of the resistance of copper

 Time delays are very predictable in optics

 E.g., < 10 ps variation with temperature in 10 m fiber  Free-space optics has equal paths for large numbers of

beams

 Move to synchronous systems?

 For precision << one clock cycle

 In optics, only need path lengths controlled to ~ cm  E.g., cutting fiber to lengths, or free-space imaging

 Could run ~ 10 m scale systems with all delays being an

integer number of clock cycles

 Without any clock phase recovery required

David Miller Stanford Computer Systems Colloquium, April 3, 2019

slide-26
SLIDE 26

26 26

The number of possible optical channels (per polarization) between two surfaces of areas A1 and A2 separated by a distance L at a wavelength  as limited by diffraction, is e.g., at 1 m wavelength for 10 cm x 10 cm surfaces separated by 10 m for 2mm x 2mm surfaces separated by 2 cm

2 2 T R C

A A N L  

Number of possible free-space channels

Stanford Computer Systems Colloquium, April 3, 2019

area AT solid angle R

2 2 2 R T R T C

A A A N L      transmitting surface receiving surface solid angle T area AR L

2 2 2 T R T R C

A A A N L     

6

10

C

N 

4

4 10

C

N  

DM, JLT 35, 343 (2017)

David Miller

slide-27
SLIDE 27

27 27

Free-space optical system approaches

Free-space optics in large arrays 1000’s or 10,000’s or channels even running at energy efficient clock-rates e.g., 2 GHz can allow multiple Tb/s on and

  • ff chip

even in only square millimeters

  • f chip area

Channels can be to

  • neighboring chips on a board
  • or different boards or racks

Stanford Computer Systems Colloquium, April 3, 2019

DM, JLT 35, 343 (2017) see also J. M. Kahn and D. A. B. Miller, “Communications expands its space,” Nature Photonics 11, 5 – 8 (2017) doi:10.1038/nphoton.2016.256

David Miller

slide-28
SLIDE 28

28 28

Free-space arrays of beams

We can easily generate large uniform arrays of light beams from

  • ne source

Diffractive optics has been able to do this for at least 30 years Aligning an entire array of light beams is not much more difficult than aligning one beam Just have to add array orientation and overall array dilation And we could servo the alignment in free-space arrays We can align optics and keep it aligned Even in physically varying and demanding situations Think of the servo-ing of the optics in a CD or DVD player Free-space arrays of beams can have the same time delay to ps levels over millions of pixels

David Miller Stanford Computer Systems Colloquium, April 3, 2019

slide-29
SLIDE 29

29 29

2D arrays of 1024 free space channels

E.g., 10 x 10 micron optical “pads” either packed closely

  • r spaced out, and using

lenslet arrays

Stanford Computer Systems Colloquium, April 3, 2019

DM, JLT 35, 343 (2017)

David Miller

slide-30
SLIDE 30

30 30

A “straw-man” low energy system approach

Key additional technology use a silicon photonics optical “interposer” layer especially with additional materials e.g, III-Vs, germanium

  • ptical couplers, including
  • optical vias
  • waveguide arrays
  • free-space couplers

detectors beside transistors or in the photonics “interposer” layer on top Key required advance beam and mode couplers with %’s of loss, not dB’s of loss major opportunity for nanophotonics Goal – 10 fJ/bit up to 10 m distance

Stanford Computer Systems Colloquium, April 3, 2019 “Straw man” system concept exploiting

  • tightly integrated optoelectronics
  • efficient beam couplers
  • free-space communications with

1000’s to 10,000’s of channels

DM, JLT 35, 343 (2017)

David Miller

slide-31
SLIDE 31

Stanford Computer Systems Colloquium, April 3, 2019

DM, JLT 35, 343 (2017)

David Miller

slide-32
SLIDE 32

32 32

Conclusions

Information processing is limited by connection density and energy dissipation inside machines Both of which are problems of wires Optics can help solve both of these There is a lot of headroom with no new physics or mechanisms needed Bandwidth density time-multiplexing, more bits/Hz though these increase energy per bit mode-multiplexing free space – e.g., 10,000 – 100,000 of channels Total energy per bit communicated could be reduced From 1 – 10 pJ/bit to 10-100 fJ/bit with possibilities for even lower energies

David Miller Stanford Computer Systems Colloquium, April 3, 2019

For a copy of these slides, please e-mail dabm@ee.stanford.edu

DM, JLT 35, 343 (2017)

slide-33
SLIDE 33

33 33

Conclusions

Key challenges Dense integration of optoelectronics with very low capacitance (fF’s to 100’s of aF) Low-loss coupling Good news These technical steps towards futuristic systems are each worthwhile themselves for nearer term systems Radical long-term steps Moving away from just conventional fiber e.g., multicore, mode-multiplexing, free space Change in system design, especially in clocking Synchronous systems? How soon are we prepared to invest how much to get there?

David Miller Stanford Computer Systems Colloquium, April 3, 2019

For a copy of these slides, please e-mail dabm@ee.stanford.edu

DM, JLT 35, 343 (2017)