Firmware Projects and Hardware Demonstrators
Yuri Gershtein, Rutgers University for Tracklet Team
- Technical Review
28-Aug-2017
1
Firmware Projects and Hardware Demonstrators Yuri Gershtein, - - PowerPoint PPT Presentation
Firmware Projects and Hardware Demonstrators Yuri Gershtein, Rutgers University for Tracklet Team Technical Review 28-Aug-2017 1 Outline Reminder about structure of Firmware Barrel Disk
1
Yuri Gershtein 8/28/2017
Hardware
communication (?)
Yuri Gershtein 8/28/2017
3
▪ Massively parallel track reconstruction
– Divide detector into 28 φ sectors (2 GeV track spans max 2 sectors), spanning all eta – Time multiplexed system (TMUX=6)
▪ Consider a SECTOR, consisting of one board (SECTOR PROCESSOR, SP), as the top-level FW unit ▪ Tracklets formed within a sector ▪ 2 GeV track can project into its adjacent sectors ▪ Projections must be sent to other sectors for stub matching ▪ Minimal data duplication ▪ One SP board for each sector → need inter-SP communication
With 25G DTC->SP links, have enough bandwidth to duplicate data to SP instead of inter-SP communication. Saves on complexity and reduces latency
Yuri Gershtein 8/28/2017
4
Yuri Gershtein 8/28/2017
5
param estimate
Yuri Gershtein 8/28/2017
6
▪ There are two firmware projects to cover half of a sector
▪ Infrastructure code (moving data, etc) shared between projects ▪ Different challenges presented by each project
▪ Hybrid + Disk project
▪ ½ barrel project
L1+L2 L3+L4 L5+L6 L1+D1 D1+D2 D3+D4
Yuri Gershtein 8/28/2017
7
Yuri Gershtein 8/28/2017
8
▪ FW is stored in GitHub repo for efficient many-person development
Yuri Gershtein 8/28/2017
9
Output File Master Config. Wires.py
Files Vivado/ simulation Input emulation Sector Processor Tracklet Emulation Stubs LUTs L1 Tracks Reduced Config SubProject.py C++ or Python software Firmware simulation Processing In hardware 'Bit' file
Legend:
Project Generation Verilog Code
Yuri Gershtein 8/28/2017 Stub organization Forming tracklets Organize tracklet projections Match tracklet projections to stubs Track fit Projection transmission to neighbors Match trans- mission
10
Stub input
Duplicate removal is the next step Each step takes predetermined amount
Track output processing steps (red) implement the algorithm
1/4 barrel project
Not all connections shown
Yuri Gershtein 8/28/2017
11
boards
Implemented in hardware
Yuri Gershtein 8/28/2017
12
Yuri Gershtein 8/28/2017
DTC DTC
Yuri Gershtein 8/28/2017
Developed at U Wisconsin Xilinx VC709 Board OSU, Cornell, Rutgers Test Stand at Cornell (4 CTP7) Test Stand at CERN (4 CTP7) CTP7 Board Test Stand at Rutgers (4 CTP7)
Yuri Gershtein 8/28/2017
✦ Provides stub inputs for both
central sector and neighbors.
✦ Neighbor communication of
projections and matches.
Yuri Gershtein 8/28/2017
TMux = 4 or TMux = 6.
Yuri Gershtein 8/28/2017
track at track sink.
Yuri Gershtein 8/28/2017
▪ single muon, ttbar
sector board
inter-board communicaMon
when:
into the memory
Yuri Gershtein 8/28/2017
sent
crossing in binary
Latency = Trk_Clk - Start - BX*36
Arrival of track at sink First stub sent (ev#1) 36 ticks of 240MHz clk in 150 ns
Until 1st possible track arrives at sink
Until last possible track arrives at sink
Yuri Gershtein 8/28/2017
TMUX 6, 240 MHz CLK
Yuri Gershtein 8/28/2017
TMUX 6, 240 MHz CLK Processing time of each module before moving to the next BX (TMUX = 6)
Yuri Gershtein 8/28/2017
TMUX 6, 240 MHz CLK Overhead in each processing module
Yuri Gershtein 8/28/2017
TMUX 6, 240 MHz CLK Inter-board communication latency:
matches and track outputs
Yuri Gershtein 8/28/2017
TMUX 6, 240 MHz CLK
Yuri Gershtein 8/28/2017
▪ Three-way comparison
▪ Two Designs (flat barrel geom.):
▪ Samples
✦ 26 sectors, both designs: Vivado <—> Emulation ✦ Half Barrel, one sector: Vivado <—>Hardware
Yuri Gershtein 8/28/2017
Yuri Gershtein 8/28/2017
Yuri Gershtein 8/28/2017
▪ Send 100 distinct single muon events from source to processing board; repeat many times to test stability for long times cf bx, orbit, timescales ▪ Compare final tracks to the expected tracks in the Track Sink, accumulate errors. ▪ Error counter connected to a register that is readout every minute. ▪ Unplug cable to convince yourself it actually does something ▪ DEMONSTRATE: Processing boards run error-free over long times.
Yuri Gershtein 8/28/2017
Includes all parts of algorithm and seeding
Yuri Gershtein 8/28/2017
NO DUPLICATE REMOVAL OR L1F1
Half-sector almost within reach with current project and technology.
Includes all parts of algorithm and seeding
Yuri Gershtein 8/28/2017
LUT Logic LUT Memory BRAM DSP LR 64 SL 50 100 VMR 150 AS 12 2 VMS 40/20 60/4 0/1 0/0 TE 38 2 1 SP 22 36 TC 859 184 0.5 51 Tpar 16 4 TProj 69/12 152/4 0/1.5 PT 570 6
Yuri Gershtein 8/28/2017
LUT Logic LUT Memory BRAM DSP PR 260 2 AP 10 1 VMP 22 28 ME 33 1 CM 29 32 MC 600 17 FM 10/12 4/4 1/1.5 MT 747 6 FT 2491 239 18.5 32 TF 49 4 PD 11202 3
Yuri Gershtein 8/28/2017
▪ Given the resources used in each module, we can estimate how much we need for a full sector in ultimate system
Ultrascale+ (VUXP) FPGA.
▪ Goal: One SP in one FPGA ▪ — Resource needs compatible with Ultrascale+ resources
LUT Logic LUT Memory BRAM DSP Full sector 279733 151191 2721.5 1818 VU3P 32% 81% 85% 80% VU5P 21% 53% 58% 52% VU7P 16% 40% 42% 40% VU9P 11% 27% 28% 27% VU11P 10% 27% 29% 20% VU13P 7% 20% 22% 15%
full sector: ±η; includes barrel, disk and hybrid
Yuri Gershtein 8/28/2017
▪ High speed link project
layout,
▪ Based on existing g-2 project
▪ Explore different 25 G technologies, including fiber and copper RTM interconnects
tech for fast turn-around
▪ Kintex and Virtex Ultrascale
grade, footprint-compatible
AMC Connect RTM Copper Connect Power Supply Atmel AVR (MMC) UltraScale FPGA B2104 Package 32 GTH 32 GTY KU095 KU115 VU080 VU095 VU125 VU160 VU190 Zynq FPGA XC7Z010
Firefly 12 Ch RECV 4 GTY 4 GTY QSFP28 4 x 25Gbps XMIT/RECV 4 GTY 4 GTY 4 GTY
RTM Fiber Connect
MTP 24 12 GTH OUT SFP 1 Gbps For IPbus 1 GTH
AXI Chip to Chip
Ethernet RJ45
ENET +12V +3.3V IPMI
QSFP28 4 x 25Gbps XMIT/RECV QSFP28 4 x 25Gbps XMIT/RECV QSFP28 4 x 25Gbps XMIT/RECV UTILITY I/O 4 GTY 1 GTH SPI CONFIG FLASH SDHC CARD DDR3 MEMORY
SERIAL LINK DEVELOPMENT RTM
USB JTAG 4 GTY QSFP28 4 x 25Gbps XMIT/RECV
Power
SMA Digital Buffers SMA SMA SMA Firefly 12 Ch XMIT 12 GTY OUT 12 GTY IN Firefly 12 Ch RECV Firefly 12 Ch XMIT 12 GTH IN MTP 24 Firefly 12 Ch RECV Firefly 12 Ch XMIT 12 GTH IN MTP 24 12 GTH OUT
Yuri Gershtein 8/28/2017
35
Yuri Gershtein 8/28/2017
Yuri Gershtein 8/28/2017
Yuri Gershtein 8/28/2017
sent
crossing in binary
Latency = Trk_Clk - Start - BX*36
Arrival of track at sink First stub sent (ev#1) 36 ticks of 240MHz clk in 150 ns
Until event’s 1st track arrives at sink
Until last possible track arrives at sink
tracks reported.
Yuri Gershtein 8/28/2017
Half Barrel Hybrid+Disks
Yuri Gershtein 8/28/2017
(67 RX + 48 TX on CTP7)
Developed at Univ. Wisconsin
Yuri Gershtein 8/28/2017
▪ Demonstrator project describes flat barrel tracker
▪ Ultimate project will be with tilted barrel:
✦ better load balancing leads to smaller effects of truncation ✦ Approach/firmware building blocks the same
L1+L2 L3+L4 L5+L6 L1+D1 D1+D2 D3+D4