Post Silicon Patchable Hardware Post-Silicon Patchable Hardware - - PowerPoint PPT Presentation
Post Silicon Patchable Hardware Post-Silicon Patchable Hardware - - PowerPoint PPT Presentation
Post Silicon Patchable Hardware Post-Silicon Patchable Hardware Masahiro F jita Masahiro Fujita VLSI Design and Education Center (VDEC) VLSI Design and Education Center (VDEC) The University of Tokyo July 22 nd , 2011 Respin Statistics (North
Respin Statistics (North America)
100% 80%
cess Respin is becoming more frequent
60% 48% 44%
- n Succ
more frequent
40% 44% 39% 33%
st Silico
20%
Firs
1998 2000 2002 2004 0%
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 2
1998 2000 2002 2004
[G. S. Spirakis, DATE 2006]
Manufacturing Cost
$5M $4M
US$)
$3M
Cost (U
$2M
sk Set C
$1M
Mas Respin risk is increasing
90nm 65nm 45nm 32nm
dramatically
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 3
90nm 65nm 45nm 32nm
[Nikkei Electronics, 2008]
Causes for Respins
Logic/Function Cl k
91% 36%
Clock Fast Path Slow Path
36% 32% 26%
Slow Path Delay/Glitch Power
26% 26% 21%
Logic and functional errors
Yield Analog Fi
19% 19%
are the leading cause
Firmware Mixed Signal IR Drop
17% 15% 15%
IR Drop 0% 20% 40% 60% 80% 100%
15%
IC/ASIC Designs Having One or More Re spins by Type of Flaw
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 4
IC/ASIC Designs Having One or More Re-spins by Type of Flaw
[Collett International Research 2005]
Conventional SoC Design Flow
High-Level Description Bug Fix
75% of the whole development time [Source: Intel 2007]
High-Level Synthesis Machine- Generated
[Source: Intel 2007]
Synthesis Bug Localization Logic Synthesis RTL Bug Fix Bug Verification/Simul ation Logic Synthesis Place & Route Pre-Silicon RTL Verification Need to Understand RTL Bug Localization E ation Post-Silicon SoC Error Detection
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 5
Design RTL Validation
Proposed Patchable SoC Design Flow
High-Level i i Bug Fix Description g Bug Localization High Level Error Verification/Simul ation High-Level Synthesis of Patchable HW B Error Detection Logic Synthesis Place & Route Pre-Silicon High-Level Verification ation Bug Localization Bug Fix P t h No Respin Needed! Post-Silicon Patchable SoC Patch Compilation
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 6
Design High-Level ECO
Proposed Patchable Hardware
Efficeum
- ffers behavioral-level programmability
Custom Datapath Patchable Controller using a patchable controller Hardwired Hardwired Patch Patch Patchable Controller ALU1 ALU1 ALU2 ALU2 Hardwired FSM Hardwired FSM Patch FSM Patch FSM ALU1 ALU1 ALU2 ALU2
Partially-Programmable Circuit (PPC)
- ffers logic-level programmability
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 7
- ffers logic level programmability
using a mixed gate/LUT circuit
Effice m Efficeum:
An Energy-Efficient Patchable Accelerator An Energy-Efficient Patchable Accelerator For Post-Silicon Engineering Changes
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 8
Energy Efficiency vs. Programmability
Energy Efficiency of 90nm OFDM Energy Efficiency of 90nm OFDM
Fixed-function HW: 200GOPS/W E b dd d P 4GOPS/W 50X! Embedded Proc.: 4GOPS/W 50X! Laptop Proc.: 0.05GOPS/W 4,000X!
>100GOPS High Performance 1W P /Th l C i 〜1W Power/Thermal Constraints
Energy efficiency (in [GOPS/W] or [J/op]) Energy efficiency (in [GOPS/W] or [J/op])
How much computation can be done in a given energy Sl i d th hi d b t t ffi i
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 9
Slowing down the chip reduces power but not efficiency
Fixed-Function Accelerator
Achieves high energy efficiency by customization:
Hardwired controller → No reprogrammability Highly-customized datapath → Low flexibility
Local Store Local Store Hardwired Controller Hardwired Controller Reg 1 Reg 1 Reg 2 Reg 2 Reg 3 Reg 3
・・・
Control Sparse Interconnect Network Sparse Interconnect Network Comp- t Comp- t Multi- li Multi- li ALU2 ALU2 ALU1 ALU1
・・・
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 10
arator arator plier plier
Proposed Patchable Accelerator
Behavioral reprogrammability by control patching Increased flexibility by adding register file via data bus
Local Store Local Store Hardwired C t ll Hardwired C t ll Reg Reg Reg Reg Reg Reg Patch Patch
y y g g
Store Store Controller Controller Reg 1 Reg 1 Reg 2 Reg 2 Reg 3 Reg 3
・・・
Patch Logic Patch Logic Control Sparse Interconnect Network Sparse Interconnect Network Control Bus Sparse Interconnect Network Sparse Interconnect Network Data Bus Comp- arator Comp- arator Multi- plier Multi- plier ALU2 ALU2 ALU1 ALU1
・・・
Register Register
Fujita Lab. – VLSI Design and Education Center - University of Tokyo
arator arator plier plier g File g File
11
Patch Logic
er
PC1 =? =?
Counte
PC2
>PCpatch? >PCpatch?
=? =?
- gram C
・・ Signal
Hardwired Controller Hardwired Controller PC1’
Pro ・・・
・ ・
- ntrol S
Control PC2’
Co
Memory Signal Memory
Program Counter Patch Control Signal Patch P h M
Fujita Lab. – VLSI Design and Education Center - University of Tokyo
Patch Memory
12
Patching Example (1/2)
Scheduling Result of Initial Design
PC ALU1 ALU2 MUL1 Next PC 1 2 2 3
wired ic
3 1
Hardw log
4
- gic
5 Dataflow graph for Initial Design
Patch lo
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 13
Patching Example (2/2)
Scheduling Result After Engineering Change
PC ALU1 ALU2 MUL1 NextPC 1 2 4 2 3
wired ic
3 1
Hardw logi
4 3 5 Dataflow graph After EC
Patch logic
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 14
Patching-Based Post-Silicon ECO Flow
P t Sili ECO Post-ECO Program Post-Silicon ECO (Spec. Change & Bug Fix) C Program High-Level S h i
+
- x
<< x +
Synthesis
x
Computing the Difference Between Two Programs Fixed-Function HW
+ x << +
Writing into Patch Memory Inserting RF & Patch Logic
- <<
x
Patch Compilation Patch
Fujita Lab. – VLSI Design and Education Center - University of Tokyo
Patch Compilation Efficeum
15
Experimental Setup
Example: 8x8 IDCT T h l F PDK 45 Technology: FreePDK 45nm Logic Synthesis: Synopsys Design Compiler Ultra g y y p y g p
High effort options with gated clock optimization
P&R Cadence SoC Enco nter P&R: Cadence SoC Encounter Simulation: Synopsys VCS y p y Power/timing analysis: Synopsys PrimeTime PX
Si l ti lt d f l l ti Simulation results are used for power calculation
Energy efficiencies (GOPS/W) are compared
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 16
Energy Efficiency Comparison
No Patching Fully-Patched 6% 48% 89%
8x8 IDCT (FreePDK 45nm technology)
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 17
( gy) Offers a tradeoff between efficiency and programmability
Area & Performance Comparison
20% 5% 5X Smaller Up to 40% Up to 40% Increase
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 18
Area Comparison
4x reduction 4x reduction 18% increase
Fully-programmable accelerator Single-function Hardwired accelerator Effi
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 19
(Technology: FreePDK 45nm (NCSU/Nangate), Operating Frequency: 200MHz)
Hardwired accelerator Efficeum
Power Comparison
6x reduction 6x reduction 13% increase
Fully-programmable accelerator Single-function Hardwired accelerator Effi
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 20
(Technology: FreePDK 45nm (NCSU/Nangate), Operating Frequency: 200MHz)
Hardwired accelerator Efficeum
Incremental High Le el S nthesis Incremental High-Level Synthesis and Patch Compilation and Patch Compilation For High-Level ECO g
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 21
Conventional High-Level Synthesis
Several phases are applied separately
This prevents incremental synthesis This prevents incremental synthesis
Scheduling Allocation Binding + x + g
Step 1
+ x +
ADD1 ADD2 MUL1 SHFT1
AD D1 AD D2
Step 1
+ << x
Step 2 Step 3
+ + x << x + MU L1 SHFT 1
Step 1 Step 2 Step 3
Registers
FSM Datapath D t th FSM Fujita Lab. – VLSI Design and Education Center - University of Tokyo 22 Datapath Datapath Datapath
Incremental High-Level Synthesis
Each operation is scheduled and bound incrementally, and the hardware is enhanced accordingly
Incremental Scheduling & Binding Incremental Scheduling & Binding
Step 1
+ +
ADD1 ADD2 MUL1 SHFT1
+ +
ADD1 ADD2 MUL1 SHFT1 Step 1 Step 2
+ + << + + + << x +
Step 1 Step 2 Step 3
Registers Registers
S
Registers
S
Registers
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 23 FSM Datapath FSM Datapath
Patch Compilation
Same as incremental synthesis, except the datapath enhancement is not allowed (only FSM is enhanced)
Incremental Scheduling & Binding Incremental Scheduling & Binding
Step 1
+ +
ADD1 ADD2 MUL1 SHFT1
+ +
ADD1 ADD2 MUL1 SHFT1 Step 1 Step 2
+ + << + + + << x +
Step 1 Step 2 Step 3
Registers Registers
S S
Registers Registers
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 24 FSM FSM Datapath Datapath
Incremental Scheduling Procedure
Extension of Swing Modulo Scheduling for VLIW compilers [Llosa et al., PACT ‘96] p [ , ]
Conventional Scheduling (Top Down) Incremental Swing Scheduling (Mix of Top Down & Bottom Up) der duling Or Sched
Very long 6 registers Shorter register Only 4 registers
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 25
y g register lifetime required lifetime y g required
Swing Scheduling Procedure
Mix of top-down and bottom-up scheduling
Phase 1: Bottom-Up Scheduling of Critical Path and Phase 2: Top-Down Scheduling of the Descendants of Phase 3: Bottom-Up Scheduling of the Ancestors of Their Ancestors Scheduled Operations Scheduled Operations
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 26
Incremental Step Insertion
A novel technique enabling incremental swing scheduling scheduling
During swing scheduling, a new control step is i d b h h d l d h d d inserted between the scheduled steps when needed
Step A Step B Step A Step B Step A Step B Step C Step D Step E Step D Step C Step D Step E
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 27
Incremental Binding Procedure
For each operation, all possible combinations of (resource registers) are examined (resource, registers) are examined
If no such binding is found, new interconnects b d i i d d between resource and registers are introduced
Enhanced Datapath Datapath
Datapath Incremental Scheduling & Binding
1 2 1 T1 Datapath
Register
Datapath
Register
Datapath Enhancement
Step 1
+ +
ADD1 ADD2 MUL1 SHFT Step 2
+ << x
Step 3
A multiplier exists but no register-to-multiplier
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 28
g p interconnect exists
Experimental Setup
The proposed method has been implemented in
- ur high-level synthesis framework Cyneum
- ur high level synthesis framework Cyneum
Incremental high-level synthesis P t h il f Effi i Patch compiler for Efficieum
Example: 5 benchmark designs
C programs of about ~100 lines Functions from IDCT ADPCM MPEG Functions from IDCT, ADPCM, MPEG Post-ECO examples are generated by random graph perturbation (next slide) perturbation (next slide)
Evaluated the quality of the method through the
Fujita Lab. – VLSI Design and Education Center - University of Tokyo
patch size and compilation time
29
Generating ECO Examples
The following graph perturbations are randomly applied to CDFG applied to CDFG
Perturbation 1: Opcode Change Original CDFG Perturbation 1: Opcode Change Perturbation 2: Operand Change Perturbation 3: Introducing
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 30
Perturbation 2: Operand Change Perturbation 3: Introducing A New Operation
Evaluation of Patch Compiler
For each benchmark, random CDFG perturbation is applied M times. For each M, 100 different post-ECO applied M times. For each M, 100 different post ECO designs are generated and then patches are compiled.
.] s) ime [sec SM State lation Ti Size (#FS e Compil Patch S Average Average
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 31
ECO Size M (#Perturbations) A A ECO Size M (#Perturbations)
PPC:
Increasing Yield Using Partially Programmable Circuits Partially-Programmable Circuits
A collaborative work with f
- Prof. Shigeru Yamashita (Ritsumeikan Univ.)
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 32
Design For Yield
At physical level, there are many techniques available for yield/defect tolerance enhancement for yield/defect tolerance enhancement At logic level, the following techniques can be applied f h d l for each module TMR: Voting DMR: If one module is defective, the other can be used Reconfigurable devices: synthesize not to use defective parts defective parts Too much overhead in area and performance
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 33
Objective of This Work
Enhance the defect tolerance by using a Partially- Programmable Circuit (PPC) Programmable Circuit (PPC)
PPC is a hybrid LUT/gate circuit To correct a single defect, full programmability such as FPGAs is unnecessary A defective wire can be made redundant by reprogramming LUTs in PPC p g g
Propose a design methodology
S th i f PPC Synthesis of PPC
Where to put LUTs
Fujita Lab. – VLSI Design and Education Center - University of Tokyo
How to reprogram LUTs for defective wires
34
PPC (Partially-Programmable Circuit)
Conventional
Non-Programmable Part consisting of conventional gates
Conventional gates Primar Primar y Inputs ry Output LUT ts LUT LUT
M U X
LUT
Programmable Part
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 35
g consisting of LUTs and MUXs
Defect Correction in PPC
Find out that a wire c is defective a wire ci is defective Conventional gates By reprogramming LUTs, the wire ci becomes redundant LUT wire ci becomes redundant LUT LUT LUT Call ci as Robust Connection (RC)
M U X
LUT
i
( )
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 36
PPC Example: Initial Circuit
P Prim LUT LUT N1 N N4 rimary In mary Outp N2 N5 N6 N7 N N puts uts LUT LUT N8 N9
LUTs are used partially in the circuit There is no redundancy now
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 37
PPC Example: Redundancy Addition 1
LUT LUT Primary Primary O N1 N2 N4 N N LUT y Inputs Outputs LUT N5 N6 N7 N8 N9
By adding this wire, some wires become Robust Connections (RCs) RC: Wires which can become redundant by reprogramming LUTs NRC: Wires which are not RCs RC: Wires which can become redundant by reprogramming LUTs NRC: Wires which are not RCs
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 38
NRC: Wires which are not RCs NRC: Wires which are not RCs
PPC Example: Redundancy Addition 2
LUT LUT Primar Primary N1 N2 N4 N N ry Inputs y Outputs LUT LUT N5 N6 N7 N8 N9 LUT LUT
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 39
PPC Example: Redundancy Addition 3
LUT LUT Primar Primary N1 N2 N4 N N ry Inputs y Outputs LUT LUT N5 N6 N7 N8 N9 LUT LUT
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 40
PPC Example: Final Circuit
Since we assume a single defect, multiple redundant connections to an LUT are multiplexed
M
LUT LUT N3 p Primar Primary
M U X
N1 N2
3
N4 N N ry Inputs y Outputs LUT LUT N5 N6 N7 N8 N9 LUT LUT
Colored wires: Robust Connection (RC) Black wires: Non-Robust Connection (NRC) Colored wires: Robust Connection (RC) Black wires: Non-Robust Connection (NRC)
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 41
Black wires: Non Robust Connection (NRC) Black wires: Non Robust Connection (NRC)
CSPF: Flexibility of Logic Circuits
Logic function
1 1 1 * *
CSPF
g1
1 1 1 1 * *
f
1 1 1 1 * 1 * * 1 1
f
1 1 1 1 1 * * * * * 1 1 1 1 1 1
Truth table
* * * 1
g2
1 1 1 1 1 1 1 1 1 * 1 *
Fujita Lab. – VLSI Design and Education Center - University of Tokyo
1 *
42
SPFD: Flexibility of LUT Circuits
g1 1 0 g1 1
1 f
L1 g1
1 1
1 1
L1 g2
g2 1 1
1 1
g1's flexibility by SPFDs
1 1 1 1 1 1
Fujita Lab. – VLSI Design and Education Center - University of Tokyo
1 1
43
Proposed Synthesis Method
Basic procedure
1 Perform a LUT mapping
- 1. Perform a LUT mapping
- 2. Determine LUTs to keep
Needs a better heuristic
Reconvergence points, non-critical nodes
- 3. Perform a technology re-mapping
4 Adding redundant wires to LUTs
- 4. Adding redundant wires to LUTs
How to find good wires? f Currently, wires are identified exhaustively
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 44
Preliminary Experiments
- 1. Mapping to K-input LUTs (K=3,4,5)
- 2. Re-mapping with keeping LUTs at the outputs
- 2. Re mapping with keeping LUTs at the outputs
- 3. For each LUT, if connecting a wire to the LUTs make another
wire RC, then it is selected. Terminate if the number of LUT , inputs is 6.
Pri Prim LUT mary Inp mary Outp LUT uts puts LUT
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 45
Experimental Results
Circuit Stuck-at-0 Stuck-at-1 R b t N R b t N #Connections which are robust to stuck-at-0/1 Robust Non- Robust Robust Non- Robust Original Added Original Added
alu2
586 13 25 582 20 29
alu4
1093 28 92 1070 25 115
apex6
591 86 106 589 98 108
rot
421 161 218 411 172 228
too_large
472 15 256 453 9 275
d
1251 153 221 1318 213 154
vda
1251 153 221 1318 213 154
C880
185 32 144 212 57 117
C1355
219 225 143 390 176 182 219 225 143 390 176 182
C1908
626 37 66 599 36 93
C2670
710 59 176 704 66 182
Fujita Lab. – VLSI Design and Education Center - University of Tokyo 46 C3540
1500 63 210 1462 52 248