Hot Chips, 2006
The Next Generation 65-nm FPGA
Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006
The Next Generation 65-nm FPGA Steve Douglass, Kees Vissers, Peter - - PowerPoint PPT Presentation
The Next Generation 65-nm FPGA Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 Hot Chips, 2006 Structure of the talk 65nm technology going towards 32nm Virtex-5 family Improved I/O Benchmarking Virtex-5 LUT6
Hot Chips, 2006
Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006
Hot Chips, 2006 slide 2
Hot Chips, 2006 slide 3
65-nm Transistor Cross Section
– ~5 atomic layers
– 3 oxide thicknesses for optimum
power and performance
– Lower dynamic power
– Maximum performance at lowest AC power
Hot Chips, 2006 slide 4
65 nm 65 nm 90 nm 90 nm 130 nm 130 nm 150 nm 150 nm 180 nm 180 nm 45 nm 45 nm 32 nm 32 nm
1.0 Volt 1.0 Volt 90nm 90nm – – Low cost Low cost Triple Oxide Triple Oxide – – Low power Low power 300mm wafers 300mm wafers – – Low cost Low cost 12 layer copper, 1 volt core 12 layer copper, 1 volt core
New process technology drives down cost FPGAs can take advantage of new technology faster than ASICs and ASSPs The cost of IC development increases. Therefore customers want to buy reconfigurable and programmable platforms, instead of developing their own. FPGA 2010: 32 nm, 5 Billion transistors
Hot Chips, 2006 slide 5
Hot Chips, 2006
Hot Chips, 2006 slide 7
Serial I/O
Hot Chips, 2006 slide 8
Hot Chips, 2006 slide 9
High-Performance 6-LUT Fabric High-Performance 6-LUT Fabric 36Kbit Dual-Port Block RAM / FIFO with ECC 36Kbit Dual-Port Block RAM / FIFO with ECC SelectIO with ChipSync + XCITE DCI SelectIO with ChipSync + XCITE DCI 550 MHz Clock Management Tile DCM + PLL 550 MHz Clock Management Tile DCM + PLL 25x18 Multiplier DSP Slice with Integrated ALU 25x18 Multiplier DSP Slice with Integrated ALU More Configuration Options More Configuration Options
Hot Chips, 2006 slide 10
– with dual 5-input LUT option – 1.4 times the value for actual logic – only 1.15 times the cost in silicon area.
LUT6 LUT6 LUT6 SRL32 SRL32 SRL32 RAM64 RAM64 LUT6 LUT6 LUT6 SRL32 SRL32 SRL32 RAM64 RAM64 LUT6 LUT6 LUT6 SRL32 SRL32 SRL32 RAM64 RAM64 LUT6 LUT6 LUT6 SRL32 SRL32 SRL32 RAM64 RAM64 Register/ Latch Register/ Register/ Latch Latch Register/ Latch Register/ Register/ Latch Latch Register/ Latch Register/ Register/ Latch Latch Register/ Latch Register/ Register/ Latch Latch
Hot Chips, 2006 slide 11
Fast Connect 1 Hop 2 Hops 3 Hops
Hot Chips, 2006 slide 12
Hot Chips, 2006 slide 13
Hot Chips, 2006 slide 14
ChipSync™ ChipSync™
FPGA Fabric FPGA Fabric FPGA Fabric ISERDES ISERDES CLK DATA
INC/DEC
State Machine State Machine IDELAY CNTRL IDELAY CNTRL 175-225 MHz
(calibration clk)
Hot Chips, 2006 slide 15
– n = 2, 3, 4, 5, 6, 7, 8, 10 bits
– Bit alignment, Word alignment, Clock alignment
ChipSync™ ChipSync™ n
ISERDES ISERDES
CLK CLKDIV
FPGA Fabric FPGA Fabric FPGA Fabric
BUFIO BUFIO BUFR BUFR
CLK Data
Hot Chips, 2006 slide 16
– Data SERDES: 2, 3, 4, 5, 6, 7, 8, 10 bits – Three-state control SERDES: 1, 2, 4 bits
ChipSync ChipSync ChipSync n n
OSERDES OSERDES
CLK CLK CLKDIV CLKDIV m m
FPGA Fabric FPGA Fabric FPGA Fabric
DCM/PMCD DCM/PMCD
Hot Chips, 2006
Hot Chips, 2006 slide 18
Inverse Quantisatio n / IDCT Inverse AC DC Prediction DCT Coeff
Texture/ID CT
Inverse scan, Prediction,
Object FIFO
Motion Comp.
Copy Controller
Texture Update FIFO Object FIFO Shared Memory Object FIFO Object FIFO
Memory Controller
1
Parser
MPEG 4 Decoder
8
RAM
Hot Chips, 2006 slide 19
Category Virtex-4 Virtex-5 Tools XST/ISE 8.1.02i XST/ISE 8.2i Devices XC4VFX140-11 Virtex5 part
RAM RAM
Eight Ports of Compressed Video In Off Chip Frame Memories Eight Ports of De-Compressed 720p Video Out
Memory Controller Memory Controller Memory Controller Memory Controller
Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder
Hot Chips, 2006 slide 20
1634
14,809
Virtex-4 Virtex-5 Design Resources Used Used Registers 21,248 20,242 LUTs 67,523 44,148 BlockRAMs 233 233 DSP Elements 192 216
Hot Chips, 2006 slide 21
5,000 10,000 15,000 20,000 25,000 30,000 35,000
<= 3 4 5 6
Number of Inputs
Virtex-5 Virtex-4
Hot Chips, 2006 slide 22
the data input and can also share a distributed write address
Three independent read
structure in 64 LUTs
Virtex-4 LUT LUT LUT LUT LUT LUT LUT LUT
Independent read address
Associated data
Independent read address
Associated data
Independent read address
Associated data Common Common write address write address Common Common write data write data
Write Port Write Port Read Port Read Port
32 Write data 32 Read data 32 32
Hot Chips, 2006 slide 23
–
1269 LUT4s in Virtex-4, MB 4.0
–
1400 LUT6s in Virtex-5, MB 5.0
Use new 6 LUT, 2 stage deeper pipe, 10% more MHz, 39% better performance Use new 6 LUT, 2 stage deeper pipe, 10% more MHz, 39% better performance
IOPB IOPB ILMB ILMB Instruction-side bus interface Instruction-side bus interface Data-side bus interface Data-side bus interface DOPB DOPB DLMB DLMB Bus IF Bus Bus IF IF Program Counter Program Program Counter Counter Instruction Buffer Instruction Buffer Instruction Decode Instruction Instruction Decode Decode Register File 32X32b Register File 32X32b Bus IF Bus Bus IF IF Add/Sub Shift/Logical Shift/Logical Shift/Logical Multiply Multiply
Hot Chips, 2006 slide 24
Suite of 74 designs run against ISE8.1i Suite of 74 designs run against ISE8.1i Slow Slow speedgrade speedgrade ( (-
10) Virtex-
4 compared to slow speedgrade speedgrade ( (-
1) Virtex-
5 Percent Percent Improvement Improvement
5
4
As high as 56% advantage for some designs
Avg of Designs
10 20 30 40 50 60 70
Hot Chips, 2006 slide 25
Hot Chips, 2006
Hot Chips, 2006 slide 27
5VLX30 5VLX30 5VLX50 5VLX50 5VLX85 5VLX85 5VLX110 5VLX110 5VLX220 5VLX220 5VLX330 5VLX330 Logic Cells Logic Cells 30,720 46,080 82,944 110,592 221,184 331,776 Block RAM Kbits Block RAM Kbits 1,152 1,728 3,456 4,608 6,912 10,368 CMTs CMTs 2 6 6 6 6 6 DSP48E Slices DSP48E Slices 32 48 48 64 128 192 EasyPath EasyPath No Yes Yes Yes Yes LUT6/FFs LUT6/FFs Total I/O Banks Total I/O Banks 13 17 17 23 23 35
Package Package Size Size IO IO FF324 FF324
19 220 220 220
FF676 FF676
27 440 400 440 440 440
FF1153 FF1153
35 800 560 800 800
FF1760 FF1760
42.5 1,200 1,200 560 800 19,200 28,800 51,840 69,120 138,240 207,360 No Distributed RAM Kbits Distributed RAM Kbits 320 480 840 1,120 2,280 3,420