Saving energy and increasing density in information processing using - - PowerPoint PPT Presentation
Saving energy and increasing density in information processing using - - PowerPoint PPT Presentation
Saving energy and increasing density in information processing using photonics David Miller, Stanford University For an electronic copy of these slides, please e- mail dabm@ee.stanford.edu See also D. A. B. Miller, Attojoule Optoelectronics
2
Summary
Growth in the use of information
Limits
interconnect density energy – which is mostly from interconnects
Using optics to eliminate the energies of wires
Using optics to solve the interconnect density problem off chips
Using optics to eliminate unnecessary circuits (and their power)
Goal – interconnects from 1cm to 10 m with ~10 – 100 fJ/bit instead of 1 – 10 pJ/bit Optics is the only physical way of
scaling interconnect density off the chip
eliminating this interconnect energy
Stanford Computer Systems Colloquium, April 3, 2019 David Miller
3
1.E+06 1.E+07 1.E+08 1.E+09 1.E+10 1.E+11 1.E+12 1.E+13 1986 1993 2000 2007
Both Internet traffic General purpose computing hardware have grown ~ 60 % per year ~ X 100 in 10 years Massive challenge for hardware scaling of Energy Energy per bit has to reduce Energy scaling not environmentally sustainable ~ 4.6 – 9% of electricity in 2012 (Van Heddeghem et al., Computer Comm. 50 64–76 (2014)) Communication density inside systems already at limits for electrical approaches
Growth in information communication and processing
- M. Hilbert and P
. Lopez, “The World’s Technological Capacity to Store, Communicate, and Compute Information,” Science 332, 60 (2011)
Telecommunications (B/s) Telecommunications (B/s) Voice Phone (B/s) Voice Phone (B/s) Internet (B/s) Internet (B/s) Gen. Purpose Computing (MIPS) Gen. Purpose Computing (MIPS)
MIPS – million instructions per second ~ 3 - 6 instructions = 1 floating point operation (FLOP)
Stanford Computer Systems Colloquium, April 3, 2019 David Miller
4
The first transatlantic cable (1865)
David Miller Stanford Computer Systems Colloquium, April 3, 2019 William Thomson (Lord Kelvin) (1824 – 1907)
5
What’s wrong with wires?
Signal gets weaker with distance Information gets mixed up
- ver long distances
“Intersymbol interference” Wiring density Wires have to be thick for long distance communications
David Miller Stanford Computer Systems Colloquium, April 3, 2019 1902 Transpacific cable from Bamfield, Vancouver to Fanning
- Island. 4000 miles long, ~ 100 characters/minute (~ 7 bits
per second) with a skilled operator
Original signal tape from Bamfield cable
6
Density problem in electrical interconnects
Get universal form of scaling for simple digital connections no repeaters, no multilevel modem techniques
A this wire carries the same number of bits per second as this wire bit rate B A / 2
Once the wiring fills all space, the capacity cannot be increased either by making the system smaller
- r making it larger
Optics completely avoids this scaling limitation no resistive loss small wavelength
- J. Parallel and Dist. Comp.
41, 4252 (1997) David Miller Stanford Computer Systems Colloquium, April 3, 2019
7
Wiring density
David Miller Stanford Computer Systems Colloquium, April 3, 2019
Chip wiring layers e.g., ~ 5 microns thick Transistors ~ 10 nm dimensions (not to scale) Chip vertical cross-section
8
ITRS Projected Chip Performance – Bytes/FLOP
Compute power in floating point
- perations per second (FLOPs)
Scaled from 2007 chip Input/Output rate from ITRS (International Technology Roadmap for Semiconductors (scaling number) (# Signal pins) x (off-chip clock rate)
1.00
10.00 100.00
1.00
10.00 100.00
2005 2010 2015 2020 2025
Year FLOPs I/O rate Byte/FLOP gap Compute power (TFLOPs) Input/ Output Rate (TByte/s)
Input/Output interconnect (I/O) rate does not keep up with ability of chip to calculate Ideal of 1 Byte of memory access for each floating point operation (FLOP) cannot be retained
- Byte/FLOP gap
DM “Device Requirements for Optical Interconnects to Silicon Chips, ” Proc. IEEE 97, 1166 - 1185 (2009)
David Miller Stanford Computer Systems Colloquium, April 3, 2019
9
Energies for communications and computations
Operation Energy per bit Wireless data 10 – 30J Internet: access 40 – 80nJ Internet: routing 20nJ Internet: optical WDM links 3nJ Reading DRAM 5pJ Communicating off chip 1 – 20 pJ Data link multiplexing and timing circuits ~ 2 pJ Communicating across chip 600 fJ Floating point operation 100fJ Energy in DRAM cell 10fJ Switching CMOS gate ~50aJ – 3fJ 1 electron at 1V, or 1 photon @1eV 0.16aJ (160zJ)
Stanford Computer Systems Colloquium, April 3, 2019
most energy is used for communications, not logic
David Miller
10 10
Data rates at different length scales
Total long distance internet traffic ~ 280 Tb/s (Cisco) Equivalent to everyone talking on the phone at once all the time Traffic on “rack to rack” network inside one large data center ~ 1 Pb/s (Google) Graphics processor and server chips peak bandwidth on and off chip ~ 1.4 Tb/s – 2 Tb/s Server processor chip on-chip bandwidths
- n-chip network bandwidth ~ 4 Tb/s
bandwidth in and out of L3 cache ~ 12.8 Tb/s
Stanford Computer Systems Colloquium, April 3, 2019
DM, JLT 35, 343 (2017)
David Miller
11
Interconnect power
- D. A. B. Miller, Proc. IEEE 97,
1166 - 1185 (2009)
Interconnect power limits chip performance ~ 50% of microprocessor power was interconnects in 2002, and has likely risen since System power is financially significant The cost of powering a server is comparable to the purchase cost of the server hardware Energy for one Google search? ~ 1 kJ Server interconnect power is already larger than solar power generation
Stanford Computer Systems Colloquium, April 3, 2019 David Miller
12
Energy and information
Though it does take more energy to send a bit over longer distances there is massively more information sent at shorter distances so much so that most energy dissipation is in shorter links and in interconnects inside machines
Stanford Computer Systems Colloquium, April 3, 2019 David Miller
13 13
Power dissipation in electrical wires
Wires always have large capacitance per unit length ~ 2 pF/cm, 200 aF/micron Simple logic-level signaling results in large dissipation Dissipate at least ~ ¼CV2 per bit in on-off signaling E.g., at 2pF/cm and a 2 cm chip, at 1 V on-off signaling energy per bit communicated at least ~ 1pJ
electrical connection small, high-impedance devices low impedance and/or high capacitance / unit length
Stanford Computer Systems Colloquium, April 3, 2019 David Miller
14 14
Logic and wiring capacitance
Wiring capacitance even to neighboring gates is comparable to or greater than the transistor capacitance Most energy in information processing is in communications not in logic even at the gate level Most energy dissipation in information processing is in charging and discharging wire capacitance which is ~ 200 aF/micron Just “touching” a bit typically costs many fJ in CMOS
Stanford Computer Systems Colloquium, April 3, 2019
Logic gate Wire
David Miller
15
Energy and information
The dominant energy dissipation at short distances inside machines is charging and discharging wire capacitance
Stanford Computer Systems Colloquium, April 3, 2019 David Miller
16 16
Need to move to optics to save energy
To save energy in the physical process of communications
stop wasting energy in charging and discharging electrical lines a fundamental quantum-mechanical advantage of
- ptics
quantum impedance conversion charge the photodetector, not the wire (Also can continue to increase interconnect density using
- ptics
solving the “byte per FLOP” problem in computer architectures)
Stanford Computer Systems Colloquium, April 3, 2019 David Miller
17
Quantum impedance conversion
The photoelectric effect means it is possible to generate a “large” voltage in a detector (e.g., a fraction of a volt), with very little signal power or energy and very little classical voltage in the light beam (< 1mV for 1nW) “quantum impedance conversion” Optics only has to charge the photodetector and transistor to the logic voltage not the interconnect line
1 nW with 1 eV photons 1 G ~ 1 nA ~ 1 V
Stanford Computer Systems Colloquium, April 3, 2019
DM, Optics Letters, 14, 146 (1989)
David Miller
18 18
How to do this?
Reduce energy in optoelectronic devices so the energy to send information optically becomes less than that of wires even for short distances e.g., centimeters or even shorter Low energy optoelectronic devices Pushing operating energies into the sub 10fJ or even attojoule range for output devices Modulators, LEDs, lasers including advanced nanophotonic structures Integrating sub-fF photodetectors right beside transistors
Stanford Computer Systems Colloquium, April 3, 2019
DM, JLT 35, 343 (2017)
David Miller
19 19
Capacitance of small structures for fJ operation
Stanford Computer Systems Colloquium, April 3, 2019
So that capacitive charging energies do not dominate, we need
- small devices for low device capacitance
- very close integration to limit wiring capacitance
Structure Capacitance 100×100m square conventional photodetector ~1pF 5×5m CMOS photodetector 4fF Wire capacitance, per m ~200aF FinFET input capacitance ~ 20 – 200 aF 1 micron cube of semiconductor ~100aF 100 nm cube of semiconductor ~10aF 10 nm cube of semiconductor ~1aF
David Miller
DM, JLT 35, 343 (2017)
20 20
First Ge quantum well waveguide-integrated modulator
10 microns long, 0.8 microns wide, 500 nm thick intrinsic region On silicon No resonator Selective area growth of quantum wells in SOI waveguides Capacitance ~ 3 fF 3 dB modulation with 4 V bias, 1 V swing, 1460 nm Dynamic energy per bit ~ 0.75 fJ Tested to 7Gb/s (equipment limited)
Si Waveguide Ge QW Modulator Contact Via 25μm High Speed Probe Pads
- S. Ren et al., IEEE PTL 24, 461 – 463 (2012)
- D. A. B. Miller, Optics Express 20, A293-A308 (2012)
Waveguides provided by Kotura
Stanford Computer Systems Colloquium, April 3, 2019
Harris and Miller groups, Stanford
David Miller
21 21
Mask and layout for nanoantenna
David Miller Stanford Computer Systems Colloquium, April 3, 2019
22 22
Need to move to optics to save energy
New additional conclusion - Stop wasting energy in the
electrical circuits used to run interconnects
low energies in optoelectronic devices themselves
cannot be exploited effectively if the dissipation in the associated circuits is large
e.g., receiver amplifier circuits dissipating 100’s fJ/bit
to pJ’s/bit
e.g., time-multiplexing circuitry dissipating pJ’s/bit
clock and data recovery (CDR) serialization/deserialization (SERDES) clock distribution Stanford Computer Systems Colloquium, April 3, 2019
DM, JLT 35, 343 (2017)
David Miller
23 23
Eliminating receiver energy
Integrate low capacitance photodetectors beside transistor input may eliminate need for voltage amplification altogether receiverless operation
- r limit it to ~ one simple low energy gain stage
“near-receiverless” operation E.g., 1 fJ received optical energy
- in 1 pF (conventional detector), generates ~1 mV
- in 30 fF (solder-bumped photodetector), generates ~33
mV
- in 1 fF (integrated detector), generates ~1 V
DM, JLT 35, 343 (2017)
David Miller Stanford Computer Systems Colloquium, April 3, 2019
24 24
Eliminating receiver energy
Integration of optoelectronics right beside transistors e.g., within < a micron or a few microns at most allows excess capacitance in the scale of only 100’s of aF Photodetector elements on scales
- f 1 micron or less dimensions
allow detector capacitance of ~100 aF Transistors themselves have input capacitances ~ 10’s to 100’s of aF Hence <1fF total capacitance is possible with integration
Stanford Computer Systems Colloquium, April 3, 2019
Photodetectors A B Channel Gate Insulator A B Source Drain
DM, JLT 35, 343 (2017)
David Miller
25 25
Large synchronous systems?
Time delays are not predictable in electronics because of
pulse dispersion the temperature coefficient of the resistance of copper
Time delays are very predictable in optics
E.g., < 10 ps variation with temperature in 10 m fiber Free-space optics has equal paths for large numbers of
beams
Move to synchronous systems?
For precision << one clock cycle
In optics, only need path lengths controlled to ~ cm E.g., cutting fiber to lengths, or free-space imaging
Could run ~ 10 m scale systems with all delays being an
integer number of clock cycles
Without any clock phase recovery required
David Miller Stanford Computer Systems Colloquium, April 3, 2019
26 26
The number of possible optical channels (per polarization) between two surfaces of areas A1 and A2 separated by a distance L at a wavelength as limited by diffraction, is e.g., at 1 m wavelength for 10 cm x 10 cm surfaces separated by 10 m for 2mm x 2mm surfaces separated by 2 cm
2 2 T R C
A A N L
Number of possible free-space channels
Stanford Computer Systems Colloquium, April 3, 2019
area AT solid angle R
2 2 2 R T R T C
A A A N L transmitting surface receiving surface solid angle T area AR L
2 2 2 T R T R C
A A A N L
6
10
C
N
4
4 10
C
N
DM, JLT 35, 343 (2017)
David Miller
27 27
Free-space optical system approaches
Free-space optics in large arrays 1000’s or 10,000’s or channels even running at energy efficient clock-rates e.g., 2 GHz can allow multiple Tb/s on and
- ff chip
even in only square millimeters
- f chip area
Channels can be to
- neighboring chips on a board
- or different boards or racks
Stanford Computer Systems Colloquium, April 3, 2019
DM, JLT 35, 343 (2017) see also J. M. Kahn and D. A. B. Miller, “Communications expands its space,” Nature Photonics 11, 5 – 8 (2017) doi:10.1038/nphoton.2016.256
David Miller
28 28
Free-space arrays of beams
We can easily generate large uniform arrays of light beams from
- ne source
Diffractive optics has been able to do this for at least 30 years Aligning an entire array of light beams is not much more difficult than aligning one beam Just have to add array orientation and overall array dilation And we could servo the alignment in free-space arrays We can align optics and keep it aligned Even in physically varying and demanding situations Think of the servo-ing of the optics in a CD or DVD player Free-space arrays of beams can have the same time delay to ps levels over millions of pixels
David Miller Stanford Computer Systems Colloquium, April 3, 2019
29 29
2D arrays of 1024 free space channels
E.g., 10 x 10 micron optical “pads” either packed closely
- r spaced out, and using
lenslet arrays
Stanford Computer Systems Colloquium, April 3, 2019
DM, JLT 35, 343 (2017)
David Miller
30 30
A “straw-man” low energy system approach
Key additional technology use a silicon photonics optical “interposer” layer especially with additional materials e.g, III-Vs, germanium
- ptical couplers, including
- optical vias
- waveguide arrays
- free-space couplers
detectors beside transistors or in the photonics “interposer” layer on top Key required advance beam and mode couplers with %’s of loss, not dB’s of loss major opportunity for nanophotonics Goal – 10 fJ/bit up to 10 m distance
Stanford Computer Systems Colloquium, April 3, 2019 “Straw man” system concept exploiting
- tightly integrated optoelectronics
- efficient beam couplers
- free-space communications with
1000’s to 10,000’s of channels
DM, JLT 35, 343 (2017)
David Miller
Stanford Computer Systems Colloquium, April 3, 2019
DM, JLT 35, 343 (2017)
David Miller
32 32
Conclusions
Information processing is limited by connection density and energy dissipation inside machines Both of which are problems of wires Optics can help solve both of these There is a lot of headroom with no new physics or mechanisms needed Bandwidth density time-multiplexing, more bits/Hz though these increase energy per bit mode-multiplexing free space – e.g., 10,000 – 100,000 of channels Total energy per bit communicated could be reduced From 1 – 10 pJ/bit to 10-100 fJ/bit with possibilities for even lower energies
David Miller Stanford Computer Systems Colloquium, April 3, 2019
For a copy of these slides, please e-mail dabm@ee.stanford.edu
DM, JLT 35, 343 (2017)
33 33
Conclusions
Key challenges Dense integration of optoelectronics with very low capacitance (fF’s to 100’s of aF) Low-loss coupling Good news These technical steps towards futuristic systems are each worthwhile themselves for nearer term systems Radical long-term steps Moving away from just conventional fiber e.g., multicore, mode-multiplexing, free space Change in system design, especially in clocking Synchronous systems? How soon are we prepared to invest how much to get there?
David Miller Stanford Computer Systems Colloquium, April 3, 2019