Exploring Benefits and Designs of Optically Connected Disintegrated - - PowerPoint PPT Presentation

exploring benefits and designs of optically connected
SMART_READER_LITE
LIVE PREVIEW

Exploring Benefits and Designs of Optically Connected Disintegrated - - PowerPoint PPT Presentation

Exploring Benefits and Designs of Optically Connected Disintegrated Processor Architecture Yan Pan, Yigit Demir, Nikos Hardavellas, John Kim ! , Gokhan Memik ""#$%&'()*+,'-+% ! %#$%&'()*+,'-+% ./*+01'2+'*-%3-45'*24+6%


slide-1
SLIDE 1

Exploring Benefits and Designs of Optically Connected Disintegrated Processor Architecture

Yan Pan, Yigit Demir, Nikos Hardavellas, John Kim!, Gokhan Memik

""#$%&'()*+,'-+% ./*+01'2+'*-%3-45'*24+6% "5)-2+/-7%89:%3$;%

!%#$%&'()*+,'-+%

<;8$=% &)'>'/-7%</*')%

slide-2
SLIDE 2

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% GRIS%

Motivation

Transistor density grows exponentially But, processors are physically constrained

– Low yield, bandwidth wall, power wall – Dark silicon: we can build dense devices we cannot afford to power

Optically-Connected Disintegrated Processor

(OCDP)

– Divide (impractical) monolithic processor into chiplets – Improves yield – Breaks the bandwidth wall – Breaks the power wall

  • Spread out chiplets, cheaper cooling

Motivation

slide-3
SLIDE 3

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% MRIS%

Motivation

Advantage of nanophotonics

– Latency – Bandwidth density

Using nanophotonics for inter-chip interconnect

– Reduced memory latency – Increased off-chip bandwidth – Increased total chip area – Increased power budget

Analytical model* for performance estimation

* N. Hardavellas et al., Tech Report NWU-EECS-10-05, Mar. 2010. Motivation

slide-4
SLIDE 4

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% LRIS%

Memory Latency

!"#$%&'$( )*+( ,( ,*-( ,*.( ,*/( ,*+( )( 0( ,)( ,0(

  • )(
  • 0(

1)( 23$$453( 6$789:(;"<$'=:(>'#?(

Motivation

slide-5
SLIDE 5

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% TRIS%

Off-chip Bandwidth

!"#$%&'$( )*+( ,( ,*-( ,*.( ,*/( ,*+( )( 0)( ,))( ,0)(

  • ))(
  • 0)(

23$$453( @AB=C&3(!"'4D&4<C(>E!F#?(

Motivation

slide-6
SLIDE 6

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% URIS%

Scaling Power, Chip Area

!"#$%&'$( )*+( ,( ,*-( ,*.( ,*/( ,*+( )(

  • ))(

.))( /))( +))( ,)))( ,-))( ,.))( 23$$453( G8<"%(H&$(I9$"(>77-?(

2="%"J%$(K8D$9(!54L$<M(.N(@AB=C&3(!O( 2="%"J%$(K8D$9(!54L$<M-N(@AB=C&3(!O( 2="%"J%$(K8D$9(!54L$<M(,N(@AB=C&3(!O( P&N$4(K8D$9(!54L$<M(,N(@AB=C&3(!O(

Motivation

slide-7
SLIDE 7

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% VRIS%

Motivation

Performance impact

– Reduced memory latency minimal – Improved off-chip bandwidth small – Total chip area small – Power budget big

Power budget scalability is critical

– Spread out chiplets – Cheaper cooling

Optically-Connected Disintegrated Processor

(OCDP)

Motivation

slide-8
SLIDE 8

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% WRIS%

Off-chip Optical Channels

!"#$%&"' ()#&*"'+,-.. /%-)"0"#&-1+ 2)$$3 /&#*4+ 53$1.&#67 2&'&*-1+ 8"9$0:&3$ ;<=+3>?*@ ;<ABC* A;:@+ ()#&*+D&E$% ;<A+3>?F@ ;<CGC* AH;:@

Optical fiber is low-loss, high speed

– Enables further spreading out chiplets – BW density was a challenge

*

* J. Cardenas et al., Optics Express 2009 Motivation

slide-9
SLIDE 9

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% SRIS%

Dense Off-chip Coupling

Dense optical fiber array [Lee et al., OSA/OFC/NFOEC 2010] <1dB loss, 8 Tbps/mm demonstrated

Motivation

slide-10
SLIDE 10

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% IHRIS%

OCDP Design Considerations

Inter-chiplet optical channel technology

– Optic fiber for low loss

Inter-chiplet optical channel organization

– Point-to-point [Koka et al., ISCA 2010] – Minimize waveguide and coupler loss

On-chip topology

– Scalable chiplet size

On-chip / off-chip bandwidth interfacing

– Distributed BW, seamless integration

Motivation

slide-11
SLIDE 11

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% IIRIS%

OCDP Architecture

Chiplet 1 Chiplet 0 src Chiplet 3 Chiplet 2 Chiplet 4 Cross-chiplet assemblies share an optical bus, forming optical crossbars (FlexiShare) Chiplet 0 Chiplet 3 Laser Source couplers Optical fiber Electrical cluster dst

OCDP Arch.

slide-12
SLIDE 12

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% IGRIS%

Firefly On-chip Topology

Firefly on-chip topology [Pan et al., ISCA 2009]

– Flexible chiplet sizing, optical on-chip communication

FlexiShare optical crossbars [Pan et al., HPCA 2010]

– Flexible bandwidth provisioning – Light-weight optical arbitration needed, proposed

Chiplet 0

C0R3

P P P P

C0R0

P P P P

C0R2

P P P P

C0R1

P P P P

C2R0

P P P P

C3R0

P P P P

C1R0

P P P P

C0 C1 C2 C3

C0R3

P P P P

C0R0

P P P P

C0R2

P P P P

C0R1

P P P P

C0

... ... C2R0

P P P P

C3R0

P P P P

C1R0

P P P P

C1 C2 C3

... ... ... ... A0 A1 A2 A3

FlexiShare CH0 CH1 CHM-1 ... R0 R1 Rk-1 R0 R1 Rk-1 ... ... ... ... ... ...

  • ut
  • ut
  • ut

in in in OCDP Arch.

slide-13
SLIDE 13

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% IMRIS%

Extending across chiplets

Distributed bandwidth across chiplets Flexible inter-chiplet bandwidth provisioning Minimal number of couplers Seamless on-chip/off-chip interfacing

Chiplet 0 Chiplet 1

OCDP Arch.

slide-14
SLIDE 14

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% ILRIS%

Technology Assumptions

Moderate DWDM (16-way)

Parameter Loss Parameter Value Coupler 1 dB Detector Sensitivity 0.01 mW Splitters 1 dB DWDM 16 ! Non-linear 1 dB fiber coupler loss 0.1 Modulator Insertion 0.1 dB fiber loss 2.00E-06 dB/cm Waveguide 0.3 dB/cm ring heating power 40 uW/ring Ring Through 0.001 dB Modulation Power 80 fJ/bit Filter Drop 1.5 dB Demodulation Power 40 fJ/bit PhotoDetector 0.1 dB

Power Eval.

slide-15
SLIDE 15

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% ITRIS%

Optical Power (320-core)

5-chiplet OCDP vs. single-chip topologies Total number of optical channels (wavelengths)

held constant.

! " #! #" $! $" %&'() *"+,-./0123 4.51607)*&893 4.51607)*&8:3 401;.<-=51

>?2=0)%/2.,=0)8?@@)*AB3

! "! #!! #"! $!! $"! %&'() *"+,-./0123 4.51607)*&893 4.51607)*&8:3 401;.<-=51

>-?@A=BCA

>?2=0)D@EF15)?6)G.BH)G1A?B=2?5A

Power Eval.

slide-16
SLIDE 16

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% IURIS%

Per-Core Network Static Power

~ 30% power reduction compared to the best

alternative.

! "! #! $! %! &! '! ()*+,

  • &./0123456

718493:,-);%6 718493:,-);<6 734=1>0?84

@A5?3,>5?51/,+AB48,+48,)A84,-CD6

Power Eval.

slide-17
SLIDE 17

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% IVRIS%

Scaling Up

OCDP limits the total on-chip waveguide length Better optical scalability ! " #! #" $! $" %! %" &! &" '()* +*",-%$!. '()* +*",-#$/!. 0123456 +(7/,-#$/!. 053819:;23 +-#$/!. '()* +*<,-$%!&. '()* +*#=,-&%"$.

>?@;5A'B@1C;5A7?DDA+EF.

Power Eval.

slide-18
SLIDE 18

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% IWRIS%

Scaling Up

OCDP shows very good power scalability. Single-chip is impractical for 1280-core

processor

! !" !"" !""" !"""" #$%& '&()*+,"- #$%& '&()*!,."- /012345 '$6.)*!,."- /427089:12 '*!,."- #$%& '&;)*,+"<- #$%& '&!=)*<+(,-

>?@:4A8@:@0BA&?C21A&21A$?12A'DE-

Power Eval.

slide-19
SLIDE 19

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% ISRIS%

Conclusion

OCDP leverages

– Low latency / high bandwidth density – Low loss optic fibers

Power scalability is critical

– Minimize optical loss on the path

Seamless on-chip / off-chip interfacing

– Firefly intra-chiplet (distributed off-chiplet BW) – Point-to-point (Dragonfly) inter-chiplet

Performance evaluation needed Chiplet composition to be explored

Conclusion

slide-20
SLIDE 20

GQIRS(T@UV(

XD'2@/-2Y%

slide-21
SLIDE 21

%%?/@5)@/-%% %%A#&B%;*C04+'C+D*'% %%B/1'*%#/,()*42/-% %%#/-CED24/-%

F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% .4O/2%P)*Q)5'EE)2% GIRIS%

On-chip Optical Channel

Silicon photonics with DWDM

(Offchip) Laser Source Waveguide Resonant Modulators Electrical Signal Electrical Signal Filters Photo Detector

Motivation

(off-chip)