Specifying Workflows Lance M Evans Cray Inc, 2016-05-03 Typical - PowerPoint PPT Presentation

Specifying Workflows Lance M Evans Cray Inc, 2016-05-03

Typical I/O Subsystem

Customer Workflow Specifications ● Every workflow is unique ● Each vertical market is similar within (but never identical) ● Storage and I/O are called out when something is wrong ● Devil’s in the details ● Customer knowledge varies ● May “think” they know how data flows through their systems ● May not know about opportunities for improvement ● Some consider their workflow a differentiator ● HPC users run similar well-tuned workloads repeatedly ● Analytics users are usually highly aware of workflow

Use Cases ● All-Read Query Absorbs and preprocesses constant sensor data to a staging area ● Loads massive amounts of data into a quantity of SSD servers ● Perform parallel queries against massive servers ● Expunge data when it is stale, and repeat ● ● GPU Load Generate a video & photo data set with millions of images, 100s of GB ● Load identical data sets into hundreds of computers at once ● Iteratively process data through machine learning algorithms ● Synchronize many parallel activities and verify convergence ● ● Checkpoint and More Burst sequentially to a bandwidth optimized medium; destage to capacity tier ● Handle competing workloads that would otherwise thrash spinning disk ● Handle many nodes of a single job in parallel even if not tuned for huge I/Os ●

Customer Workflow Specifications ● Implied Requirements ● “launch an application at full system scale in less than 30 seconds…describe factors (such as executable size) that could potentially affect application launch time…describe how applications launch scales with the number of concurrent launch requests (pers second) and scale of each launch request ” ● Translation: Open a bazillion files at once; open and read a single file a bazillion times concurrently ● “provide…consistent runtimes (i.e. wall clock time) that do not vary more than 3% from run to run in dedicated mode and 5% in production mode” ● Translation: QoS controls on fabric, guaranteed I/O rates regardless of I/O pattern or size

DataWarp Summary OSSs / OSTs Lustre Filesystem CN CN HCA LN OSSs / HCA CN CN OSTs IB A A A Fabric CN CN OSSs / HCA LN OSTs HCA CN CN OSSs / CN CN OSTs DW SSD SSD CN CN CN - Compute Node A A A CN CN LN - Lnet Router Node DW SSD DW - DataWarp Node SSD CN CN A - Aries Network

Nastran Example – Forward/Backward Reads File position (left) vs Time (bottom) On Lustre, see 3 speeds: DataWarp reads On DataWarp 1. File reading forwards, both directions at data delivered quickly using same speed. Lustre prefetching 2. File reading backwards, data initially comes quickly out of client cache 3. File still reading backwards, data On Lustre now comes slowly from OSTs Lustre job takes I/O activity in the SCR300 file, showing the forward and twice as long. backward passes of reading the factored matrix. 7

Frequently Unanswered Questions ● Project-Related ● New or existing project? ● What is the current workflow? ● What are the drivers of change? ● What must remain the same? ● Volume Variety Velocity Veracity ● The “Guzintas and the Guzoutas” ● Where does data originate? Internally? Externally? ● At what point does it come into your control? ● With what frequency, format, data quantity, object quantity? ● When is data altered, reduced, multiplied?

Frequently Unanswered Questions ● Consumers ● What applications and users access the data over its lifespan? ● What are the app interfaces’ requirements? ● What is the concurrency and granularity of access? ● Profile moments when data altered, reduced, scaled, duplicated ● Does consumption and transformation yield a new source? ● Data Husbandry ● What are the security, provenance, fixity, validation requirements? ● How long must the data be retained? Are there legal holds? ● How is data expunged? Are there new / emergent requirements?

Specifying Workflows Lance M Evans Cray Inc, 2016-05-03 Typical - PowerPoint PPT Presentation

Specifying Workflows Lance M Evans Cray Inc, 2016-05-03 Typical I/O Subsystem Customer Workflow Specifications Every workflow is unique Each vertical market is similar within (but never identical) Storage and I/O are called out when

Specifying Operations Specifying Operations Why operations are specified Algorithmic methods

Importing data Peter Humburg Statistician, Macquarie University DataCamp ChIP-seq Workflows in

Building Leaders Brief Industry 2020 Specifying for Specifying for keeps Julia Cambage, CEO

Useless Metaphors? Useless Metaphors? Why Specifying Security Why Specifying Security is So

Specifying Updates in SQL Specifying Updates in SQL There are three SQL commands to modify the

Workflows Description, Workflows Description, Enactment and Monitoring in Enactment and

Introduction to differential binding Peter Humburg Statistician, Macquarie University DataCamp

Automate your workflows with Kotlin Fosdem - 2020 1 Automate your workflows with Kotlin

Convergence of computation and data workflows IS-ENES Workshop on Workflows and Metadata

Achieving Coordination Through Dynamic Construction of Open Workflows Louis Thomas, Justin

Cirrus: A Serverless Framework for End-to-end ML Workflows Joao Carreira , Pedro Fonseca, Alexey

Specifying and Checking File System Crash-Consistency Models Steven Lang September 4, 2016

Notes on specifying systems in EST Robert Meolic, Tatjana Kapus Faculty of EE & CS

Specifying and Verifying Concurrent Algorithms with Histories and Subjectivity Ilya

PDF 2.0 New and Improved Features Supporting More Workflows MATT KUZNICKI DUFF JOHNSON

A UML Activity Diagram Extension and Template for Bioinformatics Workflows: A Design Science Study

Cosmic Background suppression for a NuMI electron-neutrino cross- section measurement in

Studies of hadronic B decays at LHCb Neus Lopez March for the LHCb collaboration 1 Outline

CSE 473: Artificial Intelligence Winter 2017 Expectimax Search Steve Tanimoto Most of these

Board of Visitors Finance Committee Meeting September 14, 2017 Agenda ACTION ITEMS TION ITEMS

The Ground Myth Dr. Bruce Archambeault IBM Distinguished Engineer IEEE Fellow IBM

Scalable Dynamic Analysis of Large Linear Systems Parasara Sridhar Duggirala Joint Work Mahesh

Tuomas Savolainen Max-Planck-Institut fr Radioastronomie Agudo Aller Aller Angelakis

SMB3 Extensions for Low Latency Tom Talpey Microsoft May 12, 2016 Problem Statement

Specifying Workflows Lance M Evans Cray Inc, 2016-05-03 Typical - PowerPoint PPT Presentation

Specifying Workflows Lance M Evans Cray Inc, 2016-05-03 Typical I/O Subsystem Customer Workflow Specifications Every workflow is unique Each vertical market is similar within (but never identical) Storage and I/O are called out when

Specifying Operations Specifying Operations Why operations are specified Algorithmic methods

Importing data Peter Humburg Statistician, Macquarie University DataCamp ChIP-seq Workflows in

Building Leaders Brief Industry 2020 Specifying for Specifying for keeps Julia Cambage, CEO

Useless Metaphors? Useless Metaphors? Why Specifying Security Why Specifying Security is So

Specifying Updates in SQL Specifying Updates in SQL There are three SQL commands to modify the

Workflows Description, Workflows Description, Enactment and Monitoring in Enactment and

Introduction to differential binding Peter Humburg Statistician, Macquarie University DataCamp

Automate your workflows with Kotlin Fosdem - 2020 1 Automate your workflows with Kotlin

Convergence of computation and data workflows IS-ENES Workshop on Workflows and Metadata

Achieving Coordination Through Dynamic Construction of Open Workflows Louis Thomas, Justin

Cirrus: A Serverless Framework for End-to-end ML Workflows Joao Carreira , Pedro Fonseca, Alexey

Specifying and Checking File System Crash-Consistency Models Steven Lang September 4, 2016

Notes on specifying systems in EST Robert Meolic, Tatjana Kapus Faculty of EE &amp; CS

Specifying and Verifying Concurrent Algorithms with Histories and Subjectivity Ilya

PDF 2.0 New and Improved Features Supporting More Workflows MATT KUZNICKI DUFF JOHNSON

A UML Activity Diagram Extension and Template for Bioinformatics Workflows: A Design Science Study

Cosmic Background suppression for a NuMI electron-neutrino cross- section measurement in

Studies of hadronic B decays at LHCb Neus Lopez March for the LHCb collaboration 1 Outline

CSE 473: Artificial Intelligence Winter 2017 Expectimax Search Steve Tanimoto Most of these

Board of Visitors Finance Committee Meeting September 14, 2017 Agenda ACTION ITEMS TION ITEMS

The Ground Myth Dr. Bruce Archambeault IBM Distinguished Engineer IEEE Fellow IBM

Scalable Dynamic Analysis of Large Linear Systems Parasara Sridhar Duggirala Joint Work Mahesh

Tuomas Savolainen Max-Planck-Institut fr Radioastronomie Agudo Aller Aller Angelakis

SMB3 Extensions for Low Latency Tom Talpey Microsoft May 12, 2016 Problem Statement

Notes on specifying systems in EST Robert Meolic, Tatjana Kapus Faculty of EE & CS