post silicon patchable hardware post silicon patchable
play

Post Silicon Patchable Hardware Post-Silicon Patchable Hardware - PowerPoint PPT Presentation

Post Silicon Patchable Hardware Post-Silicon Patchable Hardware Masahiro F jita Masahiro Fujita VLSI Design and Education Center (VDEC) VLSI Design and Education Center (VDEC) The University of Tokyo July 22 nd , 2011 Respin Statistics (North


  1. Post Silicon Patchable Hardware Post-Silicon Patchable Hardware Masahiro F jita Masahiro Fujita VLSI Design and Education Center (VDEC) VLSI Design and Education Center (VDEC) The University of Tokyo July 22 nd , 2011

  2. Respin Statistics (North America) 100% 80% Respin is becoming cess more frequent more frequent on Succ 60% 48% 44% 44% st Silico 39% 40% 33% Firs 20% 0% 1998 1998 2000 2000 2002 2002 2004 2004 [G. S. Spirakis, DATE 2006] Fujita Lab. – VLSI Design and Education Center - University of Tokyo 2

  3. Manufacturing Cost $5M $4M US$) Cost (U $3M sk Set C $2M Mas Respin risk is $1M increasing dramatically 90nm 90nm 65nm 65nm 45nm 45nm 32nm 32nm [Nikkei Electronics, 2008] Fujita Lab. – VLSI Design and Education Center - University of Tokyo 3

  4. Causes for Respins 91% Logic/Function 36% 36% Cl Clock k Fast Path 32% Slow Path Slow Path 26% 26% Delay/Glitch 26% Logic and functional errors Power 21% are the leading cause Yield 19% Analog 19% Fi Firmware 17% Mixed Signal 15% IR Drop IR Drop 15% 15% 0% 20% 40% 60% 80% 100% IC/ASIC Designs Having One or More Re spins by Type of Flaw IC/ASIC Designs Having One or More Re-spins by Type of Flaw [Collett International Research 2005] Fujita Lab. – VLSI Design and Education Center - University of Tokyo 4

  5. Conventional SoC Design Flow High-Level Description 75 % of the whole development time High-Level Bug Fix Synthesis Synthesis [Source: Intel 2007] [Source: Intel 2007] Machine- Bug Localization Generated RTL Bug Fix Verification/Simul Logic Synthesis Logic Synthesis Bug Bug ation ation Place & Route Localization Pre-Silicon Need to RTL Verification Understand RTL E Error Detection SoC Post-Silicon Design RTL Validation Fujita Lab. – VLSI Design and Education Center - University of Tokyo 5

  6. Proposed Patchable SoC Design Flow High-Level Bug Fix g Description i i Bug Localization High-Level High Level Error Error Synthesis of Detection Patchable HW Verification/Simul ation ation B Bug Localization Pre-Silicon Logic Synthesis High-Level Verification Place & Route No Bug Fix Respin Needed! P t h Patch Compilation Patchable SoC Post-Silicon Design High-Level ECO Fujita Lab. – VLSI Design and Education Center - University of Tokyo 6

  7. Proposed Patchable Hardware Efficeum offers behavioral-level programmability using a patchable controller Custom Datapath Patchable Controller Patchable Controller Hardwired Hardwired Hardwired Hardwired Patch Patch Patch Patch FSM FSM FSM FSM ALU1 ALU1 ALU1 ALU1 ALU2 ALU2 ALU2 ALU2 Partially-Programmable Circuit (PPC) offers logic-level programmability offers logic level programmability using a mixed gate/LUT circuit Fujita Lab. – VLSI Design and Education Center - University of Tokyo 7

  8. Effice m Efficeum: An Energy-Efficient Patchable Accelerator An Energy-Efficient Patchable Accelerator For Post-Silicon Engineering Changes Fujita Lab. – VLSI Design and Education Center - University of Tokyo 8

  9. Energy Efficiency vs. Programmability Energy Efficiency of 90nm OFDM Energy Efficiency of 90nm OFDM Fixed-function HW: 200GOPS/W E Embedded Proc.: 4GOPS/W 50X! b dd d P 4GOPS/W 50X! Laptop Proc.: 0.05GOPS/W 4,000X! >100GOPS High Performance 〜 1W Power/Thermal Constraints 1W P /Th l C i Energy efficiency (in [GOPS/W] or [J/op]) Energy efficiency (in [GOPS/W] or [J/op])  How much computation can be done in a given energy  Slowing down the chip reduces power but not efficiency  Sl i d th hi d b t t ffi i Fujita Lab. – VLSI Design and Education Center - University of Tokyo 9

  10. Fixed-Function Accelerator  Achieves high energy efficiency by customization:  Hardwired controller → No reprogrammability  Highly-customized datapath → Low flexibility Hardwired Hardwired Local Local Reg Reg Reg Reg Reg Reg Controller Controller ・・・ Store Store 1 1 2 2 3 3 Control Sparse Interconnect Network Sparse Interconnect Network Comp- Comp- Multi- Multi- ・・・ ALU1 ALU1 ALU2 ALU2 arator arator t t plier plier li li Fujita Lab. – VLSI Design and Education Center - University of Tokyo 10

  11. Proposed Patchable Accelerator  Behavioral reprogrammability by control patching  Increased flexibility by adding register file via data bus y y g g Local Local Hardwired Hardwired Patch Patch Patch Patch Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Store Store Store Store Controller Controller C C t t ll ll ・・・ Logic Logic 1 1 2 2 3 3 Control Control Bus Sparse Interconnect Network Sparse Interconnect Network Sparse Interconnect Network Sparse Interconnect Network Data Bus ・・・ Comp- Comp- Multi- Multi- ALU1 ALU1 ALU2 ALU2 Register Register g g arator arator arator arator plier plier plier plier File File Fujita Lab. – VLSI Design and Education Center - University of Tokyo 11

  12. Patch Logic PC1 er =? =? Counte >PC patch ? >PC patch ? PC2 =? =? ogram C Signal ・・ Hardwired Hardwired PC1’ Controller Controller ontrol S ・・・ Pro PC2’ ・ ・ Control Co Signal Memory Memory Program Counter Patch Control Signal Patch P Patch Memory h M Fujita Lab. – VLSI Design and Education Center - University of Tokyo 12

  13. Patching Example (1/2) Scheduling Result of Initial Design PC ALU1 ALU2 MUL1 Next PC 1 2 wired 2 3 ic Hardw log 3 1 4 ogic Patch lo Dataflow graph for 5 Initial Design Fujita Lab. – VLSI Design and Education Center - University of Tokyo 13

  14. Patching Example (2/2) Scheduling Result After Engineering Change PC ALU1 ALU2 MUL1 NextPC 1 2 4 wired 2 3 ic logi Hardw 3 1 4 3 Patch logic Dataflow graph 5 After EC Fujita Lab. – VLSI Design and Education Center - University of Tokyo 14

  15. Patching-Based Post-Silicon ECO Flow Post-Silicon ECO P t Sili ECO C Program Post-ECO Program (Spec. Change & Bug Fix) High-Level Synthesis S h i + x + << - x x Computing the Difference Fixed-Function HW Between Two Programs Inserting RF & Writing into Patch Logic Patch Memory x + + - << << Patch x Patch Compilation Patch Compilation Efficeum Fujita Lab. – VLSI Design and Education Center - University of Tokyo 15

  16. Experimental Setup  Example: 8x8 IDCT  T  Technology: FreePDK 45nm h l F PDK 45  Logic Synthesis: Synopsys Design Compiler Ultra g y y p y g p  High effort options with gated clock optimization  P&R Cadence SoC Enco nter  P&R: Cadence SoC Encounter  Simulation: Synopsys VCS y p y  Power/timing analysis: Synopsys PrimeTime PX  Si  Simulation results are used for power calculation l ti lt d f l l ti  Energy efficiencies (GOPS/W) are compared Fujita Lab. – VLSI Design and Education Center - University of Tokyo 16

  17. Energy Efficiency Comparison No Patching Fully-Patched 6% 48% 89% 8x8 IDCT (FreePDK 45nm technology) ( gy) Offers a tradeoff between efficiency and programmability Fujita Lab. – VLSI Design and Education Center - University of Tokyo 17

  18. Area & Performance Comparison 5% 20% 5X Smaller Up to 40% Up to 40% Increase Fujita Lab. – VLSI Design and Education Center - University of Tokyo 18

  19. Area Comparison 4x reduction 4x reduction 18% increase Fully-programmable accelerator Single-function Hardwired accelerator Hardwired accelerator Effi Efficeum (Technology: FreePDK 45nm (NCSU/Nangate), Operating Frequency: 200MHz) Fujita Lab. – VLSI Design and Education Center - University of Tokyo 19

  20. Power Comparison 6x reduction 6x reduction 13% increase Fully-programmable accelerator Single-function Hardwired accelerator Hardwired accelerator Effi Efficeum (Technology: FreePDK 45nm (NCSU/Nangate), Operating Frequency: 200MHz) Fujita Lab. – VLSI Design and Education Center - University of Tokyo 20

  21. Incremental High Le el S nthesis Incremental High-Level Synthesis and Patch Compilation and Patch Compilation For High-Level ECO g Fujita Lab. – VLSI Design and Education Center - University of Tokyo 21

  22. Conventional High-Level Synthesis  Several phases are applied separately  This prevents incremental synthesis  This prevents incremental synthesis Binding Allocation Scheduling g SHFT1 ADD1 ADD2 MUL1 AD AD + D1 D2 Step 1 x + x x Step 1 Step 1 + + + + + << Step 2 MU SHFT << + Step 2 x Step 3 L1 1 x Step 3 Registers FSM FSM D t Datapath th Datapath Datapath Datapath Fujita Lab. – VLSI Design and Education Center - University of Tokyo 22

  23. Incremental High-Level Synthesis  Each operation is scheduled and bound incrementally, and the hardware is enhanced accordingly Incremental Incremental Scheduling & Binding Scheduling & Binding SHFT1 SHFT1 ADD1 ADD2 MUL1 ADD1 ADD2 MUL1 Step 1 + + + + + + + + Step 1 Step 1 << Step 2 << + + Step 2 x Step 3 Registers Registers Registers Registers FSM S FSM S Datapath Datapath Fujita Lab. – VLSI Design and Education Center - University of Tokyo 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend