Workflow approaches in high throughput neuroscientific research. - PowerPoint PPT Presentation

Workflow approaches in high throughput neuroscientific research. Jake Carroll - Senior ICT Manager, Research The Queensland Brain Institute, UQ, Australia jake.carroll@uq.edu.au

What is QBI? • The Queensland Brain Institute is one of the largest (and probably the most computationally + storage intensive) neuroscience research focused institutes in the world. • Labs are dedicated to understanding the fundamental mechanisms that regulate brain function. • We’re working to solve some of the greatest problems that humanity faces in terms of mental illness. • QBI is an early adopter. We are the crazy ones.

Why am I here? • I came to learn, primarily. A great audience, a great set of people speaking. A wealth of capability and experience in this crowd. • I came to show you how workflows matter to my industry and the evolving nature of storage in this space. • I came to discuss how we can revolutionise storage platforms of best fit, together, with workflows at the centre of the design principles.

What types of science drive our workloads? • Basic biology. • Computational neuroscience. • Complex trait genomics (you thought NGS was data-intensive? Check this stuff out!) • Electrophysiology. • Cognitive neurosciences. • Computational biology.

What does QBI want with workflows? • Traditional beginnings: • Big supers, big storage, significant complexity. Clever people using clever things to find the clever answers to complex questions, in theory. • Turns out, biologists don’t have the time to learn the in’s and out’s of parallel filesystem semantics or computer scheduler eccentricities. • They just want to get their work done, put it somewhere and publish, 99.95% of the time. • Every aspect of the scientific “life” in the lab can be expressed ‘in-silico’ as a workflow, so we’ve found. This pays some homage to Ian Corners “birth, death and marriage” registration concept of data.

There are two user-types. A wet lab biologist A computer scientist Guess who has more sophisticated needs? Hint: It isn’t the computer scientist.

How are we helping our people? • We are in fact, building pipelines and workflow engines. • Building tools to get data “up and out” and to the right locations, harvesting meta data along the way. • People without backgrounds in HPC only peripherally appreciate the difference between scratch, campaign and archival storage. At the end of the day, they shouldn’t need to care and the workflow should be smart enough to put their data where it best fits based upon workflow. • When we build, we build for the workflow - not the IOPS or throughput of XYZ disk array.

Our image deconvolution workflow • First, what is deconvolution? • Deconvolution is a mathematical operation used in image restoration to recover an object from an image that is degraded by blurring and noise. In fluorescence microscopy, the blurring is largely due to diffraction limited imaging by the instrument; the noise being mainly photonically induced. • Our version of this runs on GPU’s. [nVidia K80’s]. P100’s if nVidia will let me near them…

The Huygens-Fresnel principle states that every point on a wave-front is a source of wavelets. These wavelets spread out in the same forward direction, at the same speed as the source wave. The new wave-front is a line tangent to all of these wavelets.

Spinning Disk Z-stack Spinning Disk Z-stack no deconvolution with deconvolution 5GB/sec of PCI-E bandwidth for one hour. 86,000,000,000 neurons in a human brain.

2. Uploader gathers meta data, dumps into 3. Automatic deconvolution on GPU 1. Acquire data at the scope object storage or POSIX depending upon infrastructure workload Deconvoled data back from GPU array Flash Ceph Then all the meta data (volume store as XFS) about all of this runs off to “the repository” so it searchable, indexable Disk reusable and discoverable. That’s an immutable, fixity- assured experiment in-silico, right there. Tape

What does the repository look like?

Massive multi-domain aware workflow and workload metadata consolidation in an object DB NGS/Genomics sequencers DICOM/Human model data Multi-PB object databases for translational workload correlation High end super-res + confocal Bioinformatic analytics Ephys + DBS microscopy effectively

And it is getting worse. A 100,000 x 100,000 pixel cyst in a 3D deconvolved reconstruction of around 4TB of image data per sample. Life is getting harder, in the life sciences - so we need to work smarter…

No better time than now Build me storage subsystems that to start embedding hints are aware of locality, compute workloads in your filesystem design. IO patterns and IO personas. (Please) stop thinking monolithically. Think about patterns and use-case modularity. How cool would a fresh, reasonable, data locality language or interface definition technology be that proliferates compute, storage, the network and software? And no, I don’t mean DMAPI…

The take aways… • Cross domain scientific research generates rich metadata for indexability, discoverability and reuse. • Don’t lose the lessons. • Correlation and re-analysis,

Information flow.

Workflow approaches in high throughput neuroscientific research. - PowerPoint PPT Presentation

Workflow approaches in high throughput neuroscientific research. Jake Carroll - Senior ICT Manager, Research The Queensland Brain Institute, UQ, Australia jake.carroll@uq.edu.au What is QBI? The Queensland Brain Institute is one of the

Peoplesoft Workflow Peoplesoft Workflow Technology Technology Putting Customer First SOA IT

Combating multiple levels of suffering: Sociocultural and social neuroscientific approaches to

STAR-CCM+ in your Workflow Bill Jester, CD-adapco STAR-CCM+ in your workflow Contents

Day 8 Workflow Cloud Resource Provisioning Todays Agenda Introduction What is workflow?

workflow: workflow: QSPR = Quantitative Structure Property

A Workflow Workflow for for Retrieving Retrieving Orthologous Orthologous A Promoters and I

High throughput High throughput kafka for science kafka for science Testing Kafkas limits

Evaluation of Improved Scalability Comparison points Throughput (IPC/Node)

High-Level Wrapper for CloudKeeper Architecture Configuration Architecture High-Level Workflow

Modeling and high-throughput approaches in developmental patterning approaches in developmental

HTPMD High Throughput Parallel Molecular Dynamics Steve Cox RENCI Engagement Overview

Design of a Petri Net-based Design of a Petri Net-based Workflow Engine Workflow Engine Simone

Introduction to CONNJUR Workflow Builder and Yes Workflow 2017 Summer Workshop: June 29, 2017

Kap. 12 Workflow Management in ERP-Systemen 12.1 Workflow Management: Konzepte 12.2 Einbindung

Module 4 - Smoothing the Workflow with the Kanban Best Practices Establishing an Even Workflow

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Diagnostic Assessment Programs in Ontario Lessons Learned in One Health Region UICC Aug 2012

Description MICROSCOPE SLIDES A SSORTED IN SYSTEMATIC (see catalogue) Title /

WEBCAST PETER J. MOGAYZEL, JR, MD, PHD Professor of Pediatrics Director, Cystic Fibrosis Center

"Photon Counting Medical X-ray Imaging from Mammography to CT" Mats Danielsson KTH

Common Paediatric Surgical Problems in the Primary Healthcare Dr Loh Ser Kheng Dale Lincoln

Cystic fibrosis - Early eradication therapy against Pseudomonas a. Dr Teresinha Leal Pr.

Total Kidney Volume (TKV) in Autosomal Dominant Polycystic Kidney Disease as model for biomarker

Epiretinal 1. How is epiretinal membrane (ERM) best diagnosed? 2. How is ERM differentially

Workflow approaches in high throughput neuroscientific research. - PowerPoint PPT Presentation

Workflow approaches in high throughput neuroscientific research. Jake Carroll - Senior ICT Manager, Research The Queensland Brain Institute, UQ, Australia jake.carroll@uq.edu.au What is QBI? The Queensland Brain Institute is one of the

Peoplesoft Workflow Peoplesoft Workflow Technology Technology Putting Customer First SOA IT

Combating multiple levels of suffering: Sociocultural and social neuroscientific approaches to

STAR-CCM+ in your Workflow Bill Jester, CD-adapco STAR-CCM+ in your workflow Contents

Day 8 Workflow Cloud Resource Provisioning Todays Agenda Introduction What is workflow?

workflow: workflow: QSPR = Quantitative Structure Property

A Workflow Workflow for for Retrieving Retrieving Orthologous Orthologous A Promoters and I

High throughput High throughput kafka for science kafka for science Testing Kafkas limits

Evaluation of Improved Scalability Comparison points Throughput (IPC/Node)

High-Level Wrapper for CloudKeeper Architecture Configuration Architecture High-Level Workflow

Modeling and high-throughput approaches in developmental patterning approaches in developmental

HTPMD High Throughput Parallel Molecular Dynamics Steve Cox RENCI Engagement Overview

Design of a Petri Net-based Design of a Petri Net-based Workflow Engine Workflow Engine Simone

Introduction to CONNJUR Workflow Builder and Yes Workflow 2017 Summer Workshop: June 29, 2017

Kap. 12 Workflow Management in ERP-Systemen 12.1 Workflow Management: Konzepte 12.2 Einbindung

Module 4 - Smoothing the Workflow with the Kanban Best Practices Establishing an Even Workflow

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Diagnostic Assessment Programs in Ontario Lessons Learned in One Health Region UICC Aug 2012

Description MICROSCOPE SLIDES A SSORTED IN SYSTEMATIC (see catalogue) Title /

WEBCAST PETER J. MOGAYZEL, JR, MD, PHD Professor of Pediatrics Director, Cystic Fibrosis Center

&quot;Photon Counting Medical X-ray Imaging from Mammography to CT&quot; Mats Danielsson KTH

Common Paediatric Surgical Problems in the Primary Healthcare Dr Loh Ser Kheng Dale Lincoln

Cystic fibrosis - Early eradication therapy against Pseudomonas a. Dr Teresinha Leal Pr.

Total Kidney Volume (TKV) in Autosomal Dominant Polycystic Kidney Disease as model for biomarker

Epiretinal 1. How is epiretinal membrane (ERM) best diagnosed? 2. How is ERM differentially

"Photon Counting Medical X-ray Imaging from Mammography to CT" Mats Danielsson KTH