ONSEN Lab Tests and Development 2831 May 2017 21st DEPFET Ws., May - - PowerPoint PPT Presentation

onsen lab tests and development
SMART_READER_LITE
LIVE PREVIEW

ONSEN Lab Tests and Development 2831 May 2017 21st DEPFET Ws., May - - PowerPoint PPT Presentation

ONSEN Lab Tests and Development 2831 May 2017 21st DEPFET Ws., May 2017 ONSEN Lab Tests and Development Thomas Geler (JLU Gieen) funded by European Union, grant n.644294 JENNIFER (Japan and Europe Network for Neutrino and Intensity


slide-1
SLIDE 1

ONSEN Lab Tests and Development

Thomas Geßler Dennis Getzkow Wolfgang Kühn Jens Sören Lange Klemens Lautenbach Simon Reiter

  • II. Physikalisches Institut, Justus-Liebig-Universität Gießen

21st International Workshop on DEPFET Detectors and Applications 28–31 May 2017 Ringberg Castle

JENNIFER (Japan and Europe Network for Neutrino and Intensity Frontier Experimental Research), an MSCA-RISE project funded by European Union, grant n.644294 Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 1 / 21

slide-2
SLIDE 2

Debugging of Data Corruption

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 2 / 21

slide-3
SLIDE 3

Selector Data Flow: “ONSEN Trigger Mismatch” Sources

PXD parser

Writer Reader ROI parser · · · · · ·

empty empty Addr(42) Addr(41) Addr(40)

44 43 42 41 40

· · · · · ·

Addr(15) Addr(14) Addr(13) Addr(12) Addr(11)

15 14 13 12 11 Event # Write Addr.

Pixel fjlter

Event # HLT trigger? Read Addr.

Refor- mater Writer Reader

1 3 1 3 1 10

DHC data (6.25 Gbps optical) Merged ROIs (MGT or LVDS) Filtered data (GbE) Memory Addr. FIFO

◮ PXD parser: Sensitive to event/framing errors in PXD data →

Internal event synchronization goes out of sync

◮ Pixel fjlter: Produces framing errors if inputs out of sync; state

machine reset not properly implemented → Cold start necessary

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 3 / 21

slide-4
SLIDE 4

Dennis Getzkow Firmware Tests

2/4

April / May 2017

Test Setup in Gießen

SELECTOR RoI SELECTOR MERGER

Data Reduction - ONSEN PC

Fork v2

(corrupts frames

  • n purpose)

Optical links Optical links Optical links ‘‘ DATCON‘‘ / TCP ‘‘ DHC 1‘‘ / TCP ‘‘ DHC 2‘‘ / TCP ‘‘ HLT‘‘ / TCP

Fork v2

(corrupts frames

  • n purpose)

Fork v2

(corrupts frames

  • n purpose)

Carrier Board Carrier Board ATCA Shelf

slide-5
SLIDE 5

Dennis Getzkow Firmware Tests

4/4

April / May 2017

Problematic Conditions

Invalid CRC in HLT frame Fragmented first DHH data (cut in middle of ZSD) Double DHC Start frame at the beginning of run ONSEN Merger discarded this data but also the beginning of the next (valid) HLT frame ONSEN Merger blocked any incoming HLT data after invalid CRC Coldstart needed Internal buffer management did not end the event properly Resulted in “event fusion”

  • f the first DHH data

(DHE data of second event started in ZSD of first event) Corrupted “only” two events No coldstart needed Additional / faulty DHC Start was not discarded properly in ONSEN Lead to event mismatch but also to internal backpressure (Selector AMC) Softreset needed for getting rid of backpressure but event mismatch was still

  • ccuring

Coldstart needed

ONSEN firmware update: these three conditions don’t cause trouble anymore

slide-6
SLIDE 6

ONSEN Emulator

◮ C++-Program by S. Reiter, PXD data reduction in software ◮ Loads test data from fjle (PXD/HLT/DATCON) (requires

0xBE12DA7A header)

◮ Similar memory management as ONSEN ◮ Processing time example

1000 events, 40 % PXD occupancy = 780 MB pixel data:

◮ ONSEN: < 2 seconds after sending HLT (1 Selector node) ◮ Emulator: (Intel i7 @ 3.4 GHz, 16 GB RAM): ◮ 11 min, 50 sec with 1 thread ◮ 2 min, 40 sec with 8 threads Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 4 / 21

slide-7
SLIDE 7

Test Results and Progress

◮ Several framing and event error cases were reproduced with test

data, identifjed, and fjxed

◮ Verifjed by testing with corrupt test data

→ patched PXD parser correctly sanitizes PXD input

◮ Correct ONSEN data processing verifjed with software emulator ◮ Next steps:

◮ Replace with rewritten PXD parser

→ increases robustness and adds better error analysis

◮ Do the same thing for DATCON parser ◮ Revise Pixel fjlter input state-machine, fjx reset Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 5 / 21

slide-8
SLIDE 8

Debugging of ONSEN Internal Links

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 6 / 21

slide-9
SLIDE 9

ONSEN Internal Links (for ROI Forwarding)

Events 1, 5, … Events 4, 8, …

M S S S S S S S S S S S S S S S S

· · · · · · ATCA backplane (fabric channels) ◮

4×600 Mbps LVDS Carrier-AMC links

3.125 Gbps MGT ATCA Backplane links

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 7 / 21

slide-10
SLIDE 10

MGT-Links on ATCA Backplane

1 2 3 4 5 6 7 8 9 10 11 12 13 14

D H S M P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S P E S S

◮ For Belle II, ONSEN will be

scaled from 2 to 9 ATCA boards

◮ First scaling tests showed

troubling results: With “Merger” fjrmware sending to multiple boards, all backplane links become unstable

◮ Debugged in detail by S. Reiter

in Gießen

◮ Problem: Crosstalk between Ethernet IO and one MGT power supply ◮ Solved by avoiding that link → use difgerent ATCA slots ◮ Additionally, Aurora reset logic had to be revised

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 8 / 21

slide-11
SLIDE 11

LVDS-Links Between Carrier and AMCs

(4 x 600 Mbps per AMC) Serial clock Serial data (300 MHz) Carrier FPGA

IDELAY

AMC FPGA

IDELAY DCM PLL

System clock (100 MHz)

Fanout Chip

◮ Connection Carrier FPGA AMC FPGA uses serial (LVDS) links ◮ Serial clock is distributed from Carrier to AMCs ◮ Clock/data phase shift is compensated by delay, determined by

tuning

◮ Problem 1: Strong delay difgerence between Carrier/AMC

combinations

◮ Problem 2: Small temperature drift of the delay ◮ Solved by implementation of online self-calibration mechanism

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 9 / 21

slide-12
SLIDE 12

Link Tests After Fixes

◮ Forward HLT packets from

◮ 1 Merger in ◮ 1 Merger-Carrier to ◮ 4 Selector-Carriers with ◮ 2 Selectors each

◮ Selector output recorded and verifjed (i.e., compared to expected

  • utput from emulator)

◮ 60-hour test with low rate (∼10 Hz): no link or data errors ◮ 30-minute test 2 kHz: no link or data errors ◮ Short test with 30 kHz and DATCON data: no link or data errors ◮ Next: Full-scale test (8 Selector-Carriers) when all boards return

from PERSY and repair at IHEP

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 10 / 21

slide-13
SLIDE 13

Phase 2 Readiness

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 11 / 21

slide-14
SLIDE 14

ONSEN (Phase 2) Shelf in Tsukuba B3: Damage

◮ Chassis of the ONSEN Prototype ATCA Shelf (planned for Phase 2

ONSEN) was damaged/warped during shipment to KEK

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 12 / 21

slide-15
SLIDE 15

ONSEN 19-Inch Rack in E-Hut

◮ The 19-inch racks foreseen for ONSEN in the E-hut don’t have

enough clearance to accept the deformed shelf

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 13 / 21

slide-16
SLIDE 16

ONSEN Phase 2 Preparation: Outlook

◮ KEK ONSEN shelf must be replaced

◮ Buy a new shelf or ◮ Send replacement from Gießen (2-slot shelf with RTM-slots suffjcient)

◮ DAQ group ofgered to buy a shelf for R&D and lend it to ONSEN

during Phase 2 → will be discussed at B2GM

◮ Two ONSEN experts will be at KEK for one month from September,

more from January 2018

◮ Support for Onsen team at KEK during phase 2 and VXD vosmic

data taking provided by JENNIFER

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 14 / 21

slide-17
SLIDE 17

Compute Node Upgrade for PANDA (Design and Production by the IHEP Beijing Trig Lab)

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 15 / 21

slide-18
SLIDE 18

Compute Node Upgrade: Carrier Board

◮ First stage: upgrade CNCB (but remain compatible with current

xFP)

◮ FPGA: Change to Xilinx UltraScale architecture

Virtex-4 FX60 Virtex-5 FX70T Kintex UltraScale 060 (CNCB) (xFP) (Upgrade) Registers 50k 44k 663k LUTs 50k × 4-input 44k × 6-input 332k × 6-input DSP Slices 128 128 2760 BRAM 4 Mb 5 Mb 38 Mb MGT 16 × 6.5 Gbps 16 × 6.5 Gbps 32 × 16.3 Gbps CPU PPC405 PPC440

  • ◮ No more hard-core CPU → Slow control on MicroBlaze or

light-weight option like IPbus

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 16 / 21

slide-19
SLIDE 19

Compute Node Upgrade: Carrier Board

◮ RAM: 2 GiB DDR2 SODIMM → 16 GiB DDR4 (8 chips) ◮ Confjguration: Flash/CPLD (slave serial) → automatic from NOR

Flash (master BPI)

◮ GbE switch: 4 AMCs, 1 switch FPGA, 1 uplink to ATCA Base

Interface

◮ 16.3 Gbps MGTs

◮ 4 links to each AMC card (currently: 4 × 600 Mbps LVDS) ◮ 14 links to ATCA backplane ◮ 1 link to RTM (10G Ethernet)

◮ Programmable MGT clock ◮ Keep:

◮ JTAG chain/AMC decoupling ◮ I2C buses, sensors ◮ IPMC connector Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 17 / 21

slide-20
SLIDE 20

Compute Node Upgrade: Rear-Transition Module (RTM)

◮ On-board USB-JTAG programmer (Digilent) ◮ UART-USB interface for 4 AMC cards + switch FPGA ◮ USB hub for UART interfaces, IPMC ◮ SFP+ cage for switch-FPGA 10G Ethernet

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 18 / 21

slide-21
SLIDE 21

CN_V4 Design Status

Jingzhou ZHAO, Zhen-An LIU, Wenxuan GONG Trigger Lab, IHEP, Beijing

slide-22
SLIDE 22

CN_V4 Block diagram

slide-23
SLIDE 23
slide-24
SLIDE 24

CN_V4 design status

  • Schematjc of Carrier board is under designning.
slide-25
SLIDE 25

Compute Node Upgrade: Current Status and Belle II Application

◮ Update: Schematics fjnished, PCB design ready in about two months ◮ First prototypes expected later this year ◮ Possible replacement for ONSEN carrier boards (only 2 spares)

→ requires signifjcant fjrmware changes

◮ Possible upgrade path for ONSEN hardware

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 19 / 21

slide-26
SLIDE 26

Firmware-Version Management

Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 20 / 21

slide-27
SLIDE 27

Firmware-Version Management

◮ Problem: FPGA Firmware version (e.g., DHH, DATCON, ONSEN)

should be identifjable online and in data fjles

◮ Idea: Store the fjrst 32 bits of the git commit hash in the FPGA’s

USERCODE or USR_ACCESS (≥ Virtex-5) register bitgen -g UserID:0x$(git rev-parse --short=8 HEAD)

◮ Make it accessible to EPICS through a PV, and log it ◮ Maybe put look-up table in EPICS to pretty-print tagged versions

git tag | while read t; do echo $(git rev-parse

  • -short=8 $t) $t; done

d5cb6ce8 v1.00.a 10eb7c50 v1.01.a ...

◮ According to Dima, DHH already implemented a similar mechanism:

◮ Store a time stamp and board number in USR_ACCESS ◮ Write the time stamp to a git tag to identify the commit Thomas Geßler (JLU Gießen) ONSEN Lab Tests and Development 21st DEPFET Ws., May 2017 21 / 21