exploring benefits and designs of optically connected
play

Exploring Benefits and Designs of Optically Connected Disintegrated - PowerPoint PPT Presentation

Exploring Benefits and Designs of Optically Connected Disintegrated Processor Architecture Yan Pan, Yigit Demir, Nikos Hardavellas, John Kim ! , Gokhan Memik ""#$%&'()*+,'-+% ! %#$%&'()*+,'-+% ./*+01'2+'*-%3-45'*24+6%


  1. Exploring Benefits and Designs of Optically Connected Disintegrated Processor Architecture Yan Pan, Yigit Demir, Nikos Hardavellas, John Kim ! , Gokhan Memik ""#$%&'()*+,'-+% ! %#$%&'()*+,'-+% ./*+01'2+'*-%3-45'*24+6% <;8$=% "5)-2+/-7%89:%3$;% &)'>'/-7%</*')%

  2. Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Motivation � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% � Transistor density grows exponentially � But, processors are physically constrained – Low yield, bandwidth wall, power wall – Dark silicon : we can build dense devices we cannot afford to power � Optically-Connected Disintegrated Processor (OCDP) – Divide (impractical) monolithic processor into chiplets – Improves yield – Breaks the bandwidth wall – Breaks the power wall • Spread out chiplets, cheaper cooling .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% GRIS%

  3. Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Motivation � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% � Advantage of nanophotonics – Latency – Bandwidth density � Using nanophotonics for inter-chip interconnect – Reduced memory latency – Increased off-chip bandwidth – Increased total chip area – Increased power budget � Analytical model* for performance estimation * N. Hardavellas et al., Tech Report NWU-EECS-10-05, Mar. 2010. .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% MRIS%

  4. Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Memory Latency � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% ,*+( ,*/( 23$$453( ,*.( ,*-( ,( !"#$%&'$( )*+( )( 0( ,)( ,0( -)( -0( 1)( 6$789:(;"<$'=:(>'#?( .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% LRIS%

  5. Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Off-chip Bandwidth � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% ,*+( ,*/( 23$$453( ,*.( ,*-( ,( !"#$%&'$( )*+( )( 0)( ,))( ,0)( -))( -0)( @AB=C&3(!"'4D&4<C(>E!F#?( .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% TRIS%

  6. Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Scaling Power, Chip Area � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% 2="%"J%$(K8D$9(!54L$<M(.N(@AB=C&3(!O( ,*+( 2="%"J%$(K8D$9(!54L$<M-N(@AB=C&3(!O( 2="%"J%$(K8D$9(!54L$<M(,N(@AB=C&3(!O( ,*/( P&N$4(K8D$9(!54L$<M(,N(@AB=C&3(!O( 23$$453( ,*.( ,*-( ,( !"#$%&'$( )*+( )( -))( .))( /))( +))( ,)))( ,-))( ,.))( G8<"%(H&$(I9$"(>77 - ?( .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% URIS%

  7. Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Motivation � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% � Performance impact – Reduced memory latency � minimal – Improved off-chip bandwidth � small – Total chip area � small – Power budget � big � Power budget scalability is critical – Spread out chiplets – Cheaper cooling � Optically-Connected Disintegrated Processor (OCDP) .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% VRIS%

  8. Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Off-chip Optical Channels � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% ()#&*"'+,-.. /%-)"0"#&-1+ /&#*4+ !"#$%&"' 2)$$3 53$1.&#67 2&'&*-1+ ;<=+3>?*@ ;<ABC* A;:@+ * 8"9$0:&3$ ()#&*+D&E$% ;<A+3>?F@ ;<CGC* AH;:@ � Optical fiber is low-loss, high speed – Enables further spreading out chiplets – BW density was a challenge * J. Cardenas et al., Optics Express 2009 .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% WRIS%

  9. Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Dense Off-chip Coupling � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% � Dense optical fiber array [Lee et al., OSA/OFC/NFOEC 2010] � <1dB loss, 8 Tbps/mm demonstrated .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% SRIS%

  10. Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% OCDP Design Considerations � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% � Inter-chiplet optical channel technology – Optic fiber for low loss � Inter-chiplet optical channel organization – Point-to-point [Koka et al., ISCA 2010] – Minimize waveguide and coupler loss � On-chip topology – Scalable chiplet size � On-chip / off-chip bandwidth interfacing – Distributed BW, seamless integration .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IHRIS%

  11. � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% OCDP Arch. OCDP Architecture � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% Electrical Chiplet 3 cluster Chiplet 0 Cross-chiplet assemblies share an optical bus, forming optical crossbars (FlexiShare) Chiplet 2 Laser Source Optical fiber couplers dst src Chiplet 1 Chiplet 0 Chiplet 3 Chiplet 4 .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IIRIS%

  12. � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% OCDP Arch. Firefly On-chip Topology � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% P P A0 P P in out P P P P C0R0 C0R1 C0R0 C0R1 P P P P R 0 R 0 P C0 P P C0 P ... ... CH 0 P P P P P P P P C0R2 C0R3 C0R2 C0R3 P P P P in out CH 1 P P P P A1 R 1 R 1 ... ... P P ... ... C1 P C1 P � ... C1R0 C1R0 P P P P A2 CH M-1 ... ... P P C2 P C2 P in out C2R0 C2R0 P P R k-1 R k-1 Chiplet 0 P P ... ... A3 ... ... P P C3 P C3 P FlexiShare C3R0 C3R0 P P P P � Firefly on-chip topology [Pan et al., ISCA 2009] – Flexible chiplet sizing, optical on-chip communication � FlexiShare optical crossbars [Pan et al., HPCA 2010] – Flexible bandwidth provisioning – Light-weight optical arbitration needed, proposed .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IGRIS%

  13. � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% OCDP Arch. Extending across chiplets � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% Chiplet 0 Chiplet 1 � Distributed bandwidth across chiplets � Flexible inter-chiplet bandwidth provisioning � Minimal number of couplers � Seamless on-chip/off-chip interfacing .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IMRIS%

  14. � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Technology Assumptions � %%B/1'*%#/,()*42/-% Power Eval. � %%#/-CED24/-% Loss Parameter Parameter Value Coupler 1 dB Detector Sensitivity 0.01 mW 16 ! Splitters 1 dB DWDM Non-linear 1 dB fiber coupler loss 0.1 Modulator Insertion 0.1 dB fiber loss 2.00E-06 dB/cm Waveguide 0.3 dB/cm ring heating power 40 uW/ring Ring Through 0.001 dB Modulation Power 80 fJ/bit Filter Drop 1.5 dB Demodulation Power 40 fJ/bit PhotoDetector 0.1 dB � Moderate DWDM (16-way) .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% ILRIS%

  15. � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Optical Power (320-core) � %%B/1'*%#/,()*42/-% Power Eval. � %%#/-CED24/-% >?2=0)%/2.,=0)8?@@)*AB3 >?2=0)D@EF15)?6)G.BH)G1A?B=2?5A $"! $" >-?@A=BCA $!! $! #" #"! #! #!! " "! ! ! %&'() 4.51607)*&893 4.51607)*&8:3 401;.<-=51 %&'() 4.51607)*&893 4.51607)*&8:3 401;.<-=51 *"+,-./0123 *"+,-./0123 � 5-chiplet OCDP vs. single-chip topologies � Total number of optical channels (wavelengths) held constant. .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% ITRIS%

  16. � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Per-Core Network Static Power � %%B/1'*%#/,()*42/-% Power Eval. � %%#/-CED24/-% @A5?3,>5?51/,+AB48,+48,)A84,-CD6 '! &! %! $! #! "! ! ()*+, 718493:,-);%6 718493:,-);<6 734=1>0?84 -&./0123456 � ~ 30% power reduction compared to the best alternative. .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IURIS%

  17. � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Scaling Up � %%B/1'*%#/,()*42/-% Power Eval. � %%#/-CED24/-% >?@;5A'B@1C;5A7?DDA+EF. &" &! %" %! $" $! #" #! " ! '()* '()* 0123456 053819:;23 '()* '()* +*",-%$!. +*",-#$/!. +(7/,-#$/!. +-#$/!. +*<,-$%!&. +*#=,-&%"$. � OCDP limits the total on-chip waveguide length � Better optical scalability .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IVRIS%

  18. � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Scaling Up � %%B/1'*%#/,()*42/-% Power Eval. � %%#/-CED24/-% >?@:4A8@:@0BA&?C21A&21A$?12A'DE- !"""" !""" !"" !" ! #$%& #$%& /012345 /427089:12 #$%& #$%& '&()*+,"- '&()*!,."- '$6.)*!,."- '*!,."- '&;)*,+"<- '&!=)*<+(,- � OCDP shows very good power scalability. � Single-chip is impractical for 1280-core processor .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IWRIS%

  19. � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Conclusion � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% Conclusion � OCDP leverages – Low latency / high bandwidth density – Low loss optic fibers � Power scalability is critical – Minimize optical loss on the path � Seamless on-chip / off-chip interfacing – Firefly intra-chiplet (distributed off-chiplet BW) – Point-to-point (Dragonfly) inter-chiplet � Performance evaluation needed � Chiplet composition to be explored .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% ISRIS%

  20. XD'2@/-2Y% GQIRS(T@UV(

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend