Multiplying Moore's Law with Proximity Communication Robert Drost, - - PowerPoint PPT Presentation

multiplying moore s law with proximity communication
SMART_READER_LITE
LIVE PREVIEW

Multiplying Moore's Law with Proximity Communication Robert Drost, - - PowerPoint PPT Presentation

Multiplying Moore's Law with Proximity Communication Robert Drost, Ph.D. Director and Distinguished Engineer Sun Microsystems Laboratories Outline The Bandwidth Motivation Proximity Communication Technology Multiplying Moore's Law


slide-1
SLIDE 1

Multiplying Moore's Law with Proximity Communication

Robert Drost, Ph.D. Director and Distinguished Engineer

Sun Microsystems Laboratories

slide-2
SLIDE 2

2

Outline

  • The Bandwidth Motivation
  • Proximity Communication Technology
  • Multiplying Moore's Law
slide-3
SLIDE 3

3

The Team

VLSI Research Group at Sun Labs

Igor Benko, Alex Chow, Wes Clark, Bill Coates, Robert Drost, Jo Ebergen, Scott Fairbanks, Jonathan Gainsley, Gilda Garreton, Yaeko Hirotsuka, Ron Ho, David Hopkins, Ian Jones, Russell Kao, Jon Lexau, Dimitri Nadezhin, Tarik Ono, Steve Rubin, Jeff Rulifson, Justin Schauer, Ivan Sutherland, and friends: David Harris, Mark Greenstreet, Ken Yang

And many others at Sun

slide-4
SLIDE 4

4

Why do we want more off-chip bandwidth anyway?

slide-5
SLIDE 5

5

Motivation: CPU vs. DRAM

J.L. Hennessy and D.A. Patterson, Computer Organization and Design, 2nd ed.

slide-6
SLIDE 6

6

Colsa Mach5 SX-8 ¼ Blue Gene/L Sandia Red Storm ASCI-Q LLNL Thunder NCSA Tungsten 10 100 1,000 3,000 0.01 0.1 1 10 100 1,000 2,000

Performance (TFlops/sec) Bisection Bandwidth (TBytes/sec)

Vector MPP Thin-node Cluster

NASA Columbia Earth Sim

10 bytes/flop 1 byte/flop 0.1 byte/flop 0.01 byte/flop 0.001 byte/flop

More bandwidth/flop

Fat-node Cluster

Blue Gene/L

(2005)

Motivation: BBW vs. Flops

(Ref 1)

slide-7
SLIDE 7

7

Bandwidth versus Memory Capacity

(Ref 1)

slide-8
SLIDE 8

8

Motivation: Lack of Data Locality

Dense Linear Algebra 3D FFT

Black=no processor pair communication White=Heavy processor pair communication

(Ref 2)

slide-9
SLIDE 9

9

Proximity Communication Tech nology

slide-10
SLIDE 10

10

Proximity Communication

Transmit Transmit Receive Receive Chip1 Chip3 Chip2

  • Avoids Off-Chip Wires
  • Increases Bandwidth/Area
  • Makes Chips Replaceable
  • Enhances Testing Capability
  • Enables Smaller Chips
  • Obviates ESD Protection
  • Shrinks Transceiver Circuits

2003 2005 2007 2009 10 100 1000

Proximity I/O Area Ball Bonding

Year 1 2 u m 1 5 u m Area Ball Bonding Proximity Communication

slide-11
SLIDE 11

11

Simple Circuits:

slide-12
SLIDE 12

12

Proximity Packaging Challenges

Heat Extraction Power Connection Alignment Force Vector

  • Performance is a function of Z, Ψ, Φ misalignments
  • With reasonable misalignment control tens of Tbps

bandwidth per chip can be realized

slide-13
SLIDE 13

13

Alignment is Multi-Dimensional

X Y Z θ Chip1 Chip2 Φ ψ

slide-14
SLIDE 14

14

  • Must align chips in X, Y, Θ, Z, Ψ, Φ
  • X, Y, Θ misalignments are corrected electronically

Tstrobe Rstrobe

Chip1 Chip2

1 1 1 1 X 1

Inactive Tx micropad Active Tx micropad Rx pad

Rx Pads

X Vernier Y Vernier

Tx Micropads

Measure... ...and correct... ...on-chip

Alignment is the major challenge

slide-15
SLIDE 15

15

Steering Circuit

One Receiver Pad Pitch

Tx pad C1 C2 B1 B2

Steering in two dimensions

slide-16
SLIDE 16

16

Pads Cross-Section

Chip 1 Chip 2 Plate Separation Transmitter Plates Receiver Plates 50μm

slide-17
SLIDE 17

17

Signal and Noise Simulated Coupling

Combining estimates for

  • Channel speed
  • Receiver sensitivity
  • Signal vs. noise for pads
  • Clocking and overhead

We can estimate...

Where G=pad separation, or gap in microns

slide-18
SLIDE 18

18

A tileable PxC block

Rx Tx Rx Tx Rx Tx Rx Tx Rx Tx Rx Tx Rx Tx Rx Tx Rx Tx Rx Tx Rx Tx Rx Tx Data channels Data channels Clock channel Tx Rx Align Align Align Align

slide-19
SLIDE 19

19

Measured results

  • TSMC 180nm CMOS
  • 72 transmit, 72 receive channels
  • 1.8 Gb/s per channel, 10-15 bit error rate
  • Aggregate 260Gb/s/chip, density 430Gb/s/mm2
  • 3pJ/bit
slide-20
SLIDE 20

20

Experimental Setup

PCB1 PCB1 PCB1 PCB2 PCB2 PCB2

Chip1 Chip1 Chip2 Chip2

slide-21
SLIDE 21

21

BER vs. chip separation

slide-22
SLIDE 22

22

Eye opening at 1.8Gb/s

slide-23
SLIDE 23

23

How do we multiply Moore's Law?

slide-24
SLIDE 24

24

The Key Idea in Moore's Law

  • Double number of transistors/chip (for same cost) every

24 months

> The principal driving force behind the past 40 years of

integrated circuit industry advancement

> An amazing prediction in 1965 based on fewer than a hundred

transistors/chip

slide-25
SLIDE 25

25

The Key Idea in Proximity Comm.

  • We connect chips with enough bandwidth that they can

perform as a single integrated chip

  • Hence, PxC increases the effective number of

transistors/chip over and above Moore's Law

slide-26
SLIDE 26

26

Multiplying Moore's Law

  • Assuming Moore's Law continues

1,000 1,000,000 1,000,000,000 1970 1980 1990 2000 2010 2020

Transistors per Chip M

  • r

e ' s L a w s c a l i n g PxC Arrays with increasing chip counts

slide-27
SLIDE 27

27

What if Moore's Law stalls?

  • Many have (incorrectly) predicted demise of Moore's Law
  • Technical causes

> Short channel effects in transistors leading to too much leakage

and hence power consumption

> Wire delay limiting performance

  • Financial causes

> Fabs cost too much to yield a return on investment

  • 65nm fabs cost $3 Billion to build (and going up 2x per generation)

> Chips cost too much to yield a return on investment

slide-28
SLIDE 28

28

Multiplying a stalled Moore's Law

  • Proximity Communication keeps increasing

transistors/chip without a fabrication contribution

1,000 1,000,000 1,000,000,000 1970 1980 1990 2000 2010 2020

Transistors per Chip M

  • r

e ' s L a w s c a l i n g P x C A r r a y s w i t h i n c r e a s i n g c h i p c

  • u

n t s If Moore's Law stalls

slide-29
SLIDE 29

29

Summary

  • Need for off-chip bandwidth motivates PxC
  • Good mechanical alignment enables PxC and its

tremendous bandwidth increase

  • PxC multiplies Moore's Law by providing enough

bandwidth to realize wafer-scale integration

slide-30
SLIDE 30

http://research.sun.com/vlsi

Multiplying Moore's Law with Proximity Communication

slide-31
SLIDE 31

31

References

(1) D. Hopkins, et al., “Circuit Techniques to Enable 430Gb/s/mm

2 Proximity Communication,”

IEEE Int'l Solid-State Circuits Conference, Feb. 2007. (2) R. Drost, et al., “Challenges in building a flat-bandwidth memory hierarchy for a large-scale computer with proximity communication,” High Performance Interconnects, 2005. Proceedings. 13th Symposium on, pp. 13-22, Aug. 2005. (3) Krste Asanovíc, et al., “The Landscape of Parallel Computing Research: A View from Berkeley,” EECS Technical Report, in press, December 3, 2006. (4) R. Drost, R. Ho, R. D. Hopkins, I. Sutherland, “Electronic Alignment for Proximity Communication,” IEEE Int'l Solid-State Circuits Conference, Feb. 2004. (5) R. Drost, R. D. Hopkins, I. Sutherland, “Proximity Communications,” IEEE Custom Integrated Circuits Conference, pp. 469-472, Sept. 2003. (6) J.L. Hennessy and D.A. Patterson, Computer Organization and Design, 2nd ed., Morgan Kaufmann Publishers, San Francisco, 1997.