tom spyrou distinguished architect
play

Tom Spyrou - PowerPoint PPT Presentation

Tom Spyrou Distinguished Architect


  1. ������������������������������������������ ��������������������������� Tom Spyrou Distinguished Architect ����������������� TAU 2016

  2. 2X 5.5M Core Performance Logic Elements Heterogeneous 70 % Up to 3D SiP Lower Power Integration 10 14 nm Up to Intel TFLOPS Tri-Gate Most Quad-Core Comprehensive Cortex-A53 Security ARM Processor

  3. ������������������������������� Today’s architectures will not hold up to tomorrow’s performance demands − Making on-chip buses wider and wider is not sufficient, need to do more Need bigger step forward than we get with evolution − As geometries shrink, interconnect delays are dominating HyperFlex built on familiar concepts 9 − Retiming, Pipelining, Optimization With an innovative new approach − Not possible with conventional architecture ������������������ ��������������������������� 3

  4. ���������������������� HyperFlex has registers throughout the core fabric Bypassable Hyper-Registers in every routing segment Bypassable Hyper-Registers on all block inputs − ALMs, M20K blocks, DSP blocks, IO cells Register location is fine-grained − Throughout the interconnect − Available in optimal locations Allows new and better approach to − Retiming clk CRAM config − Pipelining ������� ��������!"�#����� − Optimization ��������������������������� ��� �� ��������!����������!����!� 4

  5. ���������������������������$ �������%�����&��������� �'( �'( �'( �'( �'( �'( �'( �'( �'( � ������"������#$���������%&'( � ������"��)*�$��������� = Hyper-Register 5

  6. ����������������������������������������� Hyper-Registers throughout the FPGA fabric enable − Fine grain Hyper-Retiming to eliminate critical paths − Zero latency Hyper-Pipelining to eliminate routing delays − Flexible Hyper-Optimization for best-in-class performance Hyper-Aware design flow for accelerated timing closure with − Post place & route performance tuning − Hyper-register enabled synthesis and place & route for efficient pipelining − Fast Forward compilation enabling performance exploration Programmable clock tree synthesis offers − ASIC-like clocking to mitigate skew & uncertainty − Lowers power through intelligent clock enablement 6

  7. ����������� ���������� Conventional architectures − Using register stages incurs significant additional delay − Limits number of pipeline stages that can be added ��������� LUT ������ ������ LAB ������� ������� ������� ������� Routing Wire Routing Wire Routing Wire Routing Wire HyperFlex architecture − Significantly reduce cost of adding pipeline stages to a design 7

  8. ����������� ���������� HyperFlex architecture − Significantly reduce cost of adding pipeline stages to a design ��������� LUT ������ ������ LAB ������� ������� ������� ������� Routing Wire Routing Wire Routing Wire Routing Wire 8

  9. ���)#����&*�"�����#�(���� Large portion of die area is routing muxes � +����������#�,��� ��������������#������� �� ����������������#����� − H3, H6, V4, etc, or into LAB � "�����#�,���� �������������&� -.������#��������/0 9

  10. ������� ������������ "�����#�(���� Extend routing muxes to include “register” stage � �����1�������2"�(� ���� ���#��,,�&������������� ����)�3�������.��#�����/ 10

  11. ��������� ��*�+�����"�#������'�������� Add extra register locations 1. Bypassable registers in routing muxes ������������� ��������������������������� � !�����"�#!����$����������������������������� 11

  12. ��������� ��*�+�����"�#������'�������� Add extra register locations 1. Bypassable registers in routing muxes 2. Bypassable inputs to LUTs, FFs, DSPs, etc. Bypassable %���������&&����'���$ '�(��������������������� ��������� 12

  13. ��������� ��*�+�����"�#������'�������� Add extra register locations 1. Bypassable registers in routing muxes 2. Bypassable inputs to LUTs, FFs, DSPs, etc. To FFs dataf0 K FF feedback datae0 R K Upper LUT Circuitry & R K gnd Arithmetic datac0 K dataa K ����%������'�(�� datab K ���������� ��������� K K datac1 K vcc Lower LUT R K Circuitry & Arithmetic datae1 R K FF feedback dataf1 K To FFs 13

  14. ��������� ��*�+�����"�#������'�������� Add extra register locations 1. Bypassable registers in routing muxes 2. Bypassable inputs to LUTs, FFs, DSPs, etc. �)*�+��,-�%������'�(�� ���������� ��������� 14

  15. ����������%������14�5��3��,����� ������������������ �������������� 2����,�� ���� ��������6 �&�����#� +33��� -�����#��%���0 No change, or 1.4X 1 Hyper-Retiming minor RTL changes 1.6X 2 Hyper-Pipelining Added Pipelining 2X or more 3 Hyper-Optimization More Effort Three-step process to achieve maximum performance Most of the gain comes from the first two steps − Uses well understood retiming and pipelining techniques − Large performance gains come from relatively small effort More effort required to implement the third step − May be required to achieve 2X or more performance gain 15

  16. 2����5��3��,��������(���������7����5��3��,���� More Performance − Enabling higher performance applications Higher Productivity and Time to Market − Reduce engineering development time − Close timing faster Reduce Device Cost − Choose a less-expensive slower device With HyperFlex 2X performance, can you use a slower speed grade device? ���!����' .�� − Choose a less expensive smaller device .�� 9 .. Can you use a smaller device now that you have Hyper-Registers throughout the fabric? Could you run your bus at 1/2 the width and twice the frequency? 16

  17. �����!"���,��#

  18. 2������������"�#������"���,��# ����� '��#������������� ������������ -,��������0 �'( �'( �'( ��3��� Retiming Logic Logic Logic 189(�: �;<�� =;<�� 18

  19. 2������������"�#������"���,��# ����� '��#������������� ������������ -,��������0 �'( �'( �'( ��3��� Retiming Logic Logic Logic 189(�: �;<�� =;<�� ����� �������������������� �������������������� ������������ -3���������0 -3���������0 �'( �'( �'( �3��� Retiming Logic Logic Logic ===(�: �'( =�� 1;<�� 189(�:� � � ===(�:�>��9?�#��� � � 19

  20. �����!"���,��# ����� '��#������������� ������������ -,��������0 �'( �'( �'( ��3��� Retiming Logic Logic Logic 189(�: �;<�� =;<�� 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend