Data Handling A N D R E W N O R M A N Talk Overview 2 - PowerPoint PPT Presentation

Data Handling A N D R E W N O R M A N

Talk Overview 2  Infrastructure & Tools  Data Transport  Monitoring  Operations FIFE Workshop Data Handling

The Problem 3 Moving data is hard We have a LOT of data FIFE Workshop Data Handling

IF Computing Infrastructure 4 FNAL Computing CVMFS Open Science FermiGrid Grid Users (Heterogeneous) Batch System Commercial CDF DZero Clouds Clusters Clustes Central Storage Disk Cache (Bluearc) Tape (dcache Storage etc..) (enstore) FIFE Workshop Data Handling

FNAL Storage 5 Central%Disk%System% Access to these BlueArc systems is not Network% Disk% always intuitive. Head% Volume% Disk%Cache%System% External%Access% dCache Disk% “Common sense” PNFS Volume% Access%Doors UnGbacked% tasks can have Tape Backed % (Volatile) xrootd% /pnfs/exp/% Tape%System% unintended Enstore gridKp% Disk%Pool% Disk%Pool% Scratch% consequences srm% (Vola<le)% Disk%Pool% webdav% Disk%Pool% dirGa% Tape%Library% (File%family)% Database% ….% dirGb% Write%Pool% Need optimized (File%Family)% Disk%Pool% dCache% brokers to Loca<on% Raw% Disk%Pool% Tape%Libraries (Write%pool)% Database% understand the infrastructure SFA% and guard it Staging% Note: “File Families” are arbitrary Server% Disk% SFA labels that allow data to be mapped to a physical set of tape FIFE Workshop Data Handling

Tools 6  SAM and SAMWeb  SAM Catalog Browsers  File Transfer Service  IFDH FIFE Workshop Data Handling

Sequential Access w/ Metadata (SAM) 7  SAM is a combination of brokers and databases which OPTIMIZE access to large sets of data ¡ Replica catalogs ¡ Managed [site] caches ¡ Storage media specific optimizations ÷ Pre-staging mechanisms ÷ Minimize TAPE mounts  Data catalog services ¡ Dataset definition ¡ Production level accounting and recovery ¡ Data processing project management FIFE Workshop Data Handling

SAMWeb 8  Modern http based Client/Server tools  Simplifies client access to SAM functionality ¡ Eliminates the need for dedicated SAM stations at sites ¡ Allows experiments universal access to SAM resources from non-FNAL locations ¡ Allows cross platform access to the SAM toolset (Linux/Unix, OSX, anything that can run Python or talk http)  Improves upon the functions/tasks people really use ¡ Simplified function calls ¡ Optimizations to common tasks (i.e. multi-file and bulk operations) FIFE Workshop Data Handling

File Transfer Service 9  Handles large scale organization & migration of files ¡ Robust/Paranoid mode for Online/DAQ environments ¡ High throughput/Permissive mode for Offline environments  Simplifies “how” files are register w/ data catalogs ¡ Operates with the concept of “drop boxes” and rule sets ¡ Simplifies managed file replication and hierarchical organization  Designed to scale to “production” levels FIFE Workshop Data Handling

IFDH 10  Swiss army knife of file delivery  Designed to be a lightweight toolkit to handle the last leg of file delivery ¡ “Smart” broker with location awareness ¡ Integrated with SAM data catalogs ¡ Modular system for transfer protocols ÷ Provides single end user interface and syntax ÷ Allows for workflows with “mixed” transport requirements ¡ Handles authentication and certificate generation for FNAL users ¡ Bidirectional operation (i.e. copy-in and copy-out) ÷ Includes bulk copy operations  Most end users only need IFDH FIFE Workshop Data Handling

What’s New 11  SAM ¡ Easier deployment ¡ New streamlined scheme ¡ New user level documentation ¡ Optimizations to servers/stations ÷ dCache/Enstore + SFA ¡ Integration with postgres databases  SAMWeb ¡ Registered locations ➡ “access schema” translation ÷ dCache, xrootd ¡ New Authentication and Administration interfaces ¡ Integration with dCache ÷ Many functions optimized for dCache access methods ¡ New dataset management options (deletes, renames, etc…) FIFE Workshop Data Handling

What’s New 12  FTS ¡ Simplified Configuration ¡ Integrated with dCache ÷ Permits use of “volatile” pool for intermediate copyback ÷ Optimized for dCache specific access methods ¡ “Standard” recipes now provided for common uses ÷ ART framework files designed to work transparently ÷ Auxiliary tools, modules and services included in toolkit  IFDH ¡ Expanded support for access methods (dCache, xroot, etc…) ¡ Bulk transfer methods ¡ Background transfer services ¡ Simplified “smart” Authentication FIFE Workshop Data Handling

13 SAM & SAMWeb: Tricks FIFE Workshop Data Handling

Basic Data Sets 14  Define a dataset based on some “tier” and metadata selection criteria # Setup SAMWeb – It’s a UPS product export PRODUCTS=/grid/fermiapp/products/common/db/:$PRODUCTS setup sam_web_client <version> # Get a certificate kx509 Selection Criteria samweb count-files “data_tier raw” 1641854 Additional Selection Criteria samweb count-files “data_tier raw and online.detector fardet” 1415308 FIFE Workshop Data Handling

Basic Data Sets 15  With enough criteria select just the data you want: samweb count-files “data_tier raw and online.detector fardet and start_time > ‘2014-06-15T23:59:59’ ” 5257  Create “name” for the selected Samweb create-definition fardet_data_today “data_tier raw and online.detector fardet and start_time > ‘2014-06-15T23:59:59’ ”  Can now use this dataset for analysis/production FIFE Workshop Data Handling

Advanced Data Sets 16  Datasets are dynamic. ¡ They are recalculated each time they are requested.  Draining dataset pattern ¡ Looks for children ¡ Use with a job that makes children ¡ Dataset size approaches zero as you run ¡ Auto recovery Shrinks as output is produced samweb count-files data_tier raw and not isparentof:( data_tier artdaq and daq2rawdigit.base_release 'S14-01-20’) and online.detector fardet and online.totalevents > 0 FIFE Workshop Data Handling

Advanced Data Sets 17  Can use parentage to specify different types of complex relationships Raw ¡ Can do peers, mixing etc… 1 st Reco  Preserves the full Primary Mixing Input Branch parentage of every file Mixed Output ¡ Files inherit meta-info ¡ Fully trackable Mixed Output FIFE Workshop Data Handling

Projects and Monitoring 18 FIFE Workshop Data Handling

Detailed FTS Monitoring 19 FIFE Workshop Data Handling

Tailored Web Interfaces 20  Web Interfaces are tailored to the experiment’s data catalogs ¡ Data tiers ¡ Specific metadata  Provides the “novice” interface for new users FIFE Workshop Data Handling

Data Handling Service 21 O N L I N E P R O D U C T I O N O P E R A T I O N S G R O U P FIFE Workshop Data Handling

Offline Production Operations Group (OPOG) 22  New group formed to address production needs of Fermilab experiments  Designed to assist Offline Prod Ops Group and/or run the large (OPOG) scale experiment workflows (simulation, reconstruction, etc…) Physicist (New Hire)  Based on requests from Marek Minos, Nova, Minerva ZIELINSKI (Minos)  Starting Operations Now Jenny ¡ Marek Z. (MINOS) Teheran (Operator) ¡ Jenny T. and Paola B. start July 14 th Paola Buitrago (Operator) FIFE Workshop Data Handling

Scope 23  The group is patterned off the CMS operations group  Provides skilled “operators” who are able to: ¡ submit, monitor, validate, triage the large scale experiment “production” work.  Targeted at experiment’s needs for dedicated personnel who can: ¡ Understand the grid processing infrastructure and successfully: ÷ Run “keep up” processing of detector data ÷ Submit large scale simulation jobs ÷ Submit large scale reconstruction passes ÷ etc…  Augments the experiments own offline groups with additional operators FIFE Workshop Data Handling

Scope (cont.) 24  The group is not technically “developers” ¡ They will understand the general workflows but are not the programmers who work with the experiment on their scripts/code ¡ However…. ÷ Jenny and Paola are actually computer scientists with extensive development work in workflow management and cloud computing  They provide feedback to the experiments and to SCD service groups (i.e. diagnose/report problems) ¡ They coordinate across multiple requests from the experiments to get the work done (i.e. balance the “keep up” with the latest “sim” request) ¡ They can provide feedback to Liaisons about activities outside their experiment FIFE Workshop Data Handling

OPOG 25 SCD Service Experiment Groups Experiment Experiment to Data Grid Scientists Offline Handling Cloud service/devel Production Sim Reco Storage etc… etc… OPOG Physicist (New Hire) Marek ZIELINSKI (Minos) Jenny Teheran (Operator) Paola Buitrago (Operator) FIFE Workshop Data Handling

Data Handling A N D R E W N O R M A N Talk Overview 2 - PowerPoint PPT Presentation

Data Handling A N D R E W N O R M A N Talk Overview 2 Infrastructure & Tools Data Transport Monitoring Operations FIFE Workshop Data Handling The Problem 3 Moving data is hard We have a LOT of data FIFE

Material Handling Chapter 5 Designing material handling systems Overview of material

Powerpoint Presentation On Manual Handling Powerpoint Presentation On Manual Handling We proudly

Manual Handling Risk Assessment Powerpoint Presentation Manual handling technique. Hansen Manual

LITHIUM ION IN MATERIALS HANDLING LITHIUM ION IN MATERIALS HANDLING LITHIUM ION IN WAREHOUSE

Hand Ball Hand Ball What?? Handling the Ball Handling the Ball Goal - Consistent Calls

Safe and Reliable Test Results Handling Running a practice session on results handling How to

HANDLING B2B OBJECTIONS National Growth Webinar RICK LAMBERT ALAN WHITE Sales Performance

Error Handling in RCMS Error Handling in RCMS An Overview Francesco Lelli

1 HERMES HERMES Re-inventing Ground Re-inventing Ground Handling Handling 2 HERMES Created

Control Exception Handling: Exception handling is the control of error conditions or other

Overview Attacks Handling Security Incidents Security Incidents Handling Security

Exceptions Announcements Exceptions Today's Topic: Handling Errors 4 Today's Topic: Handling

Data Handling: Import, Cleaning and Visualisation Lecture 7: Data Sources, Data Gathering, Data

DHH (Data Handling Hybrid) Igor Konorov, Dima Levit, Andrei Rabusov, (Yunpeng Bai) Present

Data Handling: Import, Cleaning and Visualisation Lecture 3: Data Storage and Data Structures

ROI HSP Project Approach to handling NI codes inbound to ROI IGG 11-01-12 Approach to handling

Back In Black: Towards Formal, Black Box Analysis Of Sanitizers and Filters George Argyros* ,

Estimation of Theoretically Consistent Stochastic Frontier Functions in R Arne Henningsen

Section 3.3: Dummies and Interactions Jared S. Murray The University of Texas at Austin McCombs

1 THE FEAST Leviticus 23:41-43 And ye shall keep it a Feast unto the LORD seven days in the year.

MySlice overview Jordan Aug e, Lo c Baron (UPMC) OpenLab plugfest January 23-25, 2013

Symbolic Finite Automata Margus Veanes April 5, 2014 VSSE'14, Grenoble, France 1 Overview

Local Government Funding Reform: Settlement 2019/20 and the Road Ahead 1 Outline for the

GENI MAX Spiral 2 Year-end Project Review MAX Regional Network as a GENI Substrate ProtoGENI

Sambuz

Useful Links

Newsletter

Mail Us