Data Format and Packaging, An Update Kurt Biery 18 March 2020 - PowerPoint PPT Presentation

Data Format and Packaging, An Update Kurt Biery 18 March 2020 DUNE DAQ Dataflow Working Group Meeting

Data ‘Format’ At the DAQ workshop, it was proposed to focus our data format investigations on • A DUNE-specific binary format stored in HDF5 files In the (admittedly small number of) subsequent discussions, this has been received positively. Eric Flumerfelt has done preliminary work in demonstrating the writing of artdaq::Fragments (a la PDSP) in HDF5. Next step: share information on what has been done so far with a few more technical experts from offline and online, gather feedback, etc. Run tests, encoding/decoding speed, etc. 2 18-Mar-2020 Data Format and Packaging Update

Data ‘Packaging’ ‘Packaging’ ~= ‘grouping and subdividing’ • Determining how file boundaries are managed… • What appears in each file… At the DAQ workshop, we used different ‘types of data’ as a starting point for discussion - possibly misleading. Here, I’d like to start with different types of packaging and come back to different types of data later… 3 18-Mar-2020 Data Format and Packaging Update

Data packaging choices (parameters) Some of the parameters that can be used to specify how data is grouped into files: 1. Whether or not the data in each file on disk will have geographically complete coverage (superset, TD has details) 1. If not, what subdivision will be used 2. The maximum size of the files that will be created 3. The maximum time interval/duration that will be stored in a single file (data time or wall clock time both seem possible) 4 18-Mar-2020 Data Format and Packaging Update

Priority of the choices Given the choices described on the previous slide, we can imagine sets of answers/values for #1, 2, 3 that can’t simultaneously be satisfied. So, we would need to specify which one(s) are the most important. For example, • For trigger type Y during normal running, the file size specification is the most important. 5 18-Mar-2020 Data Format and Packaging Update

Part of the configuration for DF? Should we (Dataflow subsystem) support a set of configuration parameters, keyed by data type, that specifies how the data for that data type is packaged? I believe that we can identify the set of parameters that will be needed to specify how the data files for a given data type should be handled. Discussion of the parameter values can, and should, be deferred until closer to data taking. 6 18-Mar-2020 Data Format and Packaging Update

Easing back into data types… My sense is that we have two high-level types: • Triggered data - Trigger Records that are produced in response to a Trigger Decision • Streaming data - Data that is collected without Trigger Record boundaries - E.g. WIB debugging data - The Trigger Primitive stream might also fit in this category 7 18-Mar-2020 Data Format and Packaging Update

Data packaging choices, take 2 For Triggered data: 1. Whether each file on disk will have an integer number of Trigger Records, or whether each file can have a fractional number of Trigger Record(s) For both Triggered and Streaming data: 2. Whether or not the data in each file on disk will have geographically complete coverage (superset, TD has details) If not, what subdivision will be used 1. 3. The maximum size of files that will be created 4. The maximum time interval/duration that will be stored in a single file (data time or wall clock time both seem possible) We will still need to specify priority among these… 8 18-Mar-2020 Data Format and Packaging Update

Different ‘types of data’ Types mentioned in earlier discussions: • Local Trigger Records – e.g. beam triggers • Extended Trigger Records – e.g. SNB triggers • Trigger Primitive stream – all TPs • WIB debugging stream – temporary stream that can be enabled for debugging Others may be mentioned/proposed over time… 9 18-Mar-2020 Data Format and Packaging Update

Possible choices for 1 data type Beam Trigger Records: • Integer number of Trigger Records per file: Yes • Geographically complete data in each file: Yes (TPC, PDS, Trigger, Timing; superset, Trigger specifies details in the Trigger Decision) • Maximum file size: <optimized for offline use> • Maximum time duration per file: TBD (0.5 hour?) • Priorities: TBD (to be determined) If TR size < max_file_size, integer # of TRs; otherwise, file size 1. Etc. 2. ** These value choices are for illustration only. If we support configurable data packaging in the Dataflow subsystem, then the values can be changed, under the direction of the appropriate physics groups, offline folks, online folks, etc. 10 18-Mar-2020 Data Format and Packaging Update

Possible choices for a 2 nd data type Supernova Burst Trigger Records: • Integer number of Trigger Records per file: No • Geographically complete data in each file: No - Files split by APA (for example) (PDS, etc details TBD) • Maximum file size: <optimized for offline use> • Maximum time window per file: TBD • Priorities: TBD File size 1. Etc. 2. ** These value choices are for illustration only. 11 18-Mar-2020 Data Format and Packaging Update

Possible choices for a 3 rd data type The Trigger Primitive Stream: • Integer number of Trigger Records per file: n/a • Geographically complete data in each file: Yes (TPC, PDS, Trigger, Timing; superset, subdetector components which don’t have TPs won’t contribute) • Maximum file size: <optimized for offline use> • Maximum time window per file: TBD • Priorities: File size 1. Etc. 2. ** These value choices are for illustration only. If we support configurable data packaging in the Dataflow subsystem, then the values can be changed, under the direction of the appropriate physics groups, offline folks, online folks, etc. 12 18-Mar-2020 Data Format and Packaging Update

Possible choices for a 4 th data type The WIB Debug Stream: • Integer number of Trigger Records per file: n/a • Geographically complete data in each file: No - Files split by <TBD> • Maximum file size: <optimized for offline use> • Maximum time window per file: TBD • Priorities: - File size - Etc. ** These value choices are for illustration only. 13 18-Mar-2020 Data Format and Packaging Update

Comments 1. Choosing to support this configurability does not necessarily mean that we will need to build a general-purpose rules engine. The options aren’t that numerous; we could simply encapsulate them in a class. 2. Remember that we’re talking about interfaces here… Data handoff, and the specification of the packaging. - Implementation details within both the online and the offline have freedom… 3. New trigger types that have readout windows in the range of 10-100 seconds can easily be supported by a configurable DF data packaging system – the packaging config would be part of the proposal from the physics group or whomever. 4. Files wouldn’t necessarily need to have consistent “spans” (time window or number of TRs) [metadata files discussed next slide] 14 18-Mar-2020 Data Format and Packaging Update

Ideas 1. Data challenge in Feb 2021 2. Metadata and manifest files… - Metadata file for each raw data file - Manifest file for each TR that spans multiple files - Metadata could instead be internal to the raw data file - Sample metadata information for SNB files: • the trigger number/identifier • the APA number (or whatever geographic identifier(s) are appropriate) • the beginning and ending timestamps of the trigger window (or start time and window size) • the beginning and ending timestamps of the interval that is covered by the individual file (or start time and window size 15 18-Mar-2020 Data Format and Packaging Update

Backup slides 16 18-Mar-2020 Data Format and Packaging Update

Some topics that have come up Where to save information about which components are in the partition. In each data file? Each metadata file? (configuration archive, for sure) Reminder: partitions do not span detector cryomodules. 17 18-Mar-2020 Data Format and Packaging Update

Reminder about Tom’s requirements Tom has summarized the following requirements: 1. longevity of support 2. integrity checks – for the file format as well as the data fragments 3. ability to read in small subsets of the trigger records and drop from memory data no longer being used 4. ability to navigate through a trigger record to get the adjacent time or space samples 5. compression tools 6. browsable with a lightweight, interactive tool 7. ability to handle evolution of data formats and structure gracefully with backward compatibility ensured https://wiki.dunescience.org/wiki/Project_Requirement_Brainstorming#Data_Format 18 18-Mar-2020 Data Format and Packaging Update

Data Format and Packaging, An Update Kurt Biery 18 March 2020 - PowerPoint PPT Presentation

Data Format and Packaging, An Update Kurt Biery 18 March 2020 DUNE DAQ Dataflow Working Group Meeting Data Format At the DAQ workshop, it was proposed to focus our data format investigations on A DUNE-specific binary format stored

Heavy Metals Continued Presence in Consumer Packaging 1 Packaging! Packaging is one-third

1 The Garbage Barge 2 Packaging! Packaging is one-third of the waste stream Most

PACKAGING CONCERNS David Syrett FIMMM, APgkPrf Packaging Consultant PACKAGING CONCERNS FOR

rpm-packaging Project overview and update What does rpm-packaging do? RPM Packaging for

Glass Packaging Institute Overview and Activity Update Bryan Vickers Glass Packaging Institute

Rocket Your Success Lesson 2: Packaging The Art of Packaging Proper Packaging gives

ASPRS LiDAR Data Exchange Format Standard ASPRS LiDAR Data Exchange Format Standard LAS IIT

Strategic Issues for Binary/File Format ILDG4 May 21 2004, T.Yoshie CCS,Tsukuna Definition

Politics and Packaging in Europe and North America Same Packaging Requirements for All?

Packaging and Packaging and Printed Paper Printed Paper Stewardship Plan Stewardship Plan

ISO Standards on Packaging and the Environment ASTM Packaging Workshop Michigan State

Plastic Packaging Manufacturer ABOUT COMPANY Most efficient plastic packaging manufacturer

Alpal FlexBin Presentation Kurver Industrial Packaging Alpal FlexBin A New Packaging Solution

Tobacco plain packaging? Australia implemented plain packaging in 2012 Some other countries plan

Expanded Polystyrene Packaging Expanded Polystyrene Packaging A Professional Commitment Who is

3D & Advanced Packaging Company Overview March 12, 2015 3D & ADVANCED PACKAGING IS

Introduction to SystemVerilog Instructor: Nima Honarmand (Slides adapted from Prof. Milders

Computational Learning Theory: Positive and negative learnability results Machine Learning 1

Obfuscated Circuits with Capabilities and Performance Beyond the SAT Attacks Conference on

Session Title: Challenges in Learning Science Concepts Teaching Emergence: An Attempt at

Recap: Map-Reduce Map Phase Reduce Phase (per record

One Trillion Edges: Graph Processing at Facebook-Scale GraphHPC 2015, Moscow Avery Ching Sergey

A Greybeard's Worst Nightmare How Kubernetes and Containers are re-defining the Linux OS Daniel

Functional Descriptions as the Bridge between Hypermedia APIs and the Semantic Web Ruben Verborgh

Data Format and Packaging, An Update Kurt Biery 18 March 2020 - PowerPoint PPT Presentation

Data Format and Packaging, An Update Kurt Biery 18 March 2020 DUNE DAQ Dataflow Working Group Meeting Data Format At the DAQ workshop, it was proposed to focus our data format investigations on A DUNE-specific binary format stored

Heavy Metals Continued Presence in Consumer Packaging 1 Packaging! Packaging is one-third

1 The Garbage Barge 2 Packaging! Packaging is one-third of the waste stream Most

PACKAGING CONCERNS David Syrett FIMMM, APgkPrf Packaging Consultant PACKAGING CONCERNS FOR

rpm-packaging Project overview and update What does rpm-packaging do? RPM Packaging for

Glass Packaging Institute Overview and Activity Update Bryan Vickers Glass Packaging Institute

Rocket Your Success Lesson 2: Packaging The Art of Packaging Proper Packaging gives

ASPRS LiDAR Data Exchange Format Standard ASPRS LiDAR Data Exchange Format Standard LAS IIT

Strategic Issues for Binary/File Format ILDG4 May 21 2004, T.Yoshie CCS,Tsukuna Definition

Politics and Packaging in Europe and North America Same Packaging Requirements for All?

Packaging and Packaging and Printed Paper Printed Paper Stewardship Plan Stewardship Plan

ISO Standards on Packaging and the Environment ASTM Packaging Workshop Michigan State

Plastic Packaging Manufacturer ABOUT COMPANY Most efficient plastic packaging manufacturer

Alpal FlexBin Presentation Kurver Industrial Packaging Alpal FlexBin A New Packaging Solution

Tobacco plain packaging? Australia implemented plain packaging in 2012 Some other countries plan

Expanded Polystyrene Packaging Expanded Polystyrene Packaging A Professional Commitment Who is

3D &amp; Advanced Packaging Company Overview March 12, 2015 3D &amp; ADVANCED PACKAGING IS

Introduction to SystemVerilog Instructor: Nima Honarmand (Slides adapted from Prof. Milders

Computational Learning Theory: Positive and negative learnability results Machine Learning 1

Obfuscated Circuits with Capabilities and Performance Beyond the SAT Attacks Conference on

Session Title: Challenges in Learning Science Concepts Teaching Emergence: An Attempt at

Recap: Map-Reduce Map Phase Reduce Phase (per record

One Trillion Edges: Graph Processing at Facebook-Scale GraphHPC 2015, Moscow Avery Ching Sergey

A Greybeard's Worst Nightmare How Kubernetes and Containers are re-defining the Linux OS Daniel

Functional Descriptions as the Bridge between Hypermedia APIs and the Semantic Web Ruben Verborgh

3D & Advanced Packaging Company Overview March 12, 2015 3D & ADVANCED PACKAGING IS