Bifrost Easy High-Throughput Computing - PowerPoint PPT Presentation

Bifrost Easy High-Throughput Computing github.com/ledatelescope/bifrost Miles Cranmer (Harvard/McGill), with: Ben Barsdell (NVIDIA), Danny Price (Berkeley), Hugh Garsden (Harvard), Gregory Taylor (UNM), Jayce Dowell (UNM), Frank Schinzel (NRAO), Lincoln Greenhill (Harvard)

The Problem: Every 4 years, an astronomer is killed by inefficient pipeline development 5/9/17 Miles Cranmer 2

The Problem: Can take 1 year for a team to develop a high- throughput pipeline • Say 5 new terrestrial telescopes each year • Say 4 astronomers work on pipelines for these (20 astronomer-years/year)/(80 years life exp.) ≈ 1 astronomer killed every 4 years! 5/9/17 Miles Cranmer 3

Solution: : Bifrost A Pipeline Processing Framework Bifrost saves lives * TM *(well… it saves time) 5/9/17 Miles Cranmer 4

What is a “High -Throughput Pipeline”? • Pipeline: • "High- throughput” • chain of processing • 10-40+ Gbps per node elements working on a continuous stream of data “Processing element” “Data transfer” 5/9/17 Miles Cranmer 5

Why is this difficult? • Each step works at their own pace • Astronomy – can’t just scale up hardware • Need maximal efficiency • Huge data flow on CPU & GPU • Have to deal with continuous data flow 5/9/17 Miles Cranmer 6

Bifrost 5/9/17 Miles Cranmer 7

Bifrost pre-cursor: PSRDADA • Warp-speed fast, but C API looks like this: int example_dada_client_writer_open (dada_client_t* client); multilog (ctx->log, LOG_ERR, "could not allocate memory\n"); } int64_t example_dada_client_write (dada_client_t* client, void* data, return (EXIT_FAILURE); ctx->header_written = 0; uint64_t data_size); } } int64_t example_dada_client_writer_write_block (dada_client_t* // read the ASCII DADA header from the file else { client, void* data, uint64_t data_size, uint64_t block_id); if (fileread (ctx->header_file, ctx->obs_header, // write data to the data_size bytes to the data_block int example_dada_client_writer_close (dada_client_t* client, uint64_t DADA_DEFAULT_HEADER_SIZE) < 0) { memset (data, 0, data_size); bytes_written); free (ctx->obs_header); } typedef struct { multilog (ctx->log, LOG_ERR, "could not read ASCII header from return data_size; dada_hdu_t * hdu; %s\n", ctx->header_file); } multilog_t * log; // logging interface return (EXIT_FAILURE); /*! Transfer data to data block, 1 block only */ char * header_file; // file containing DADA header } int64_t example_dada_client_writer_write_block (dada_client_t* char * obs_header; // file containing DADA header ctx->header_written = 0; client, void* data, uint64_t data_size, uint64_t block_id){ char header_written; // flag for header I/O } assert (client != 0); } example_client_writer_t; /*! Transfer header/data to data block */ example_client_writer_t * ctx = (example_client_writer_t *) client- void usage(){ int64_t example_dada_client_writer_write (dada_client_t* client, >context; fprintf (stdout, void* data, uint64_t data_size){ assert(ctx != 0); "example_dada_client_writer [options] header\n" assert (client != 0); // write 1 block of data " -k key hexadecimal shared memory key [default: %x]\n" example_client_writer_t * ctx = (example_client_writer_t *) client- memset (data, 0, data_size); "header DADA header file contain obs metadata\n", >context; return data_size; DADA_DEFAULT_BLOCK_KEY); assert(ctx != 0); } } if (!ctx->header_written) { /*! Function that closes socket */ /*! Function that opens the data transfer target */ // write the obs_header to the header_block int example_dada_client_writer_close (dada_client_t* client, uint64_t int example_dada_client_writer_open (dada_client_t* client){ uint64_t header_size = ipcbuf_get_bufsz (ctx->hdu->header_block); bytes_written){ assert (client != 0); char * header = ipcbuf_get_next_write (ctx->hdu->header_block); assert (client != 0); example_client_writer_t * ctx = (example_client_writer_t *) client- memcpy (header, ctx->obs_header, header_size); example_client_writer_t * ctx = (example_client_writer_t *) client- 5/9/17 Miles Cranmer 8 >context; // flag the header block for this "obsevation" as filled >context; assert(ctx != 0); if (ipcbuf_mark_filled (ctx->hdu->header_block, header_size) < 0) { assert(ctx != 0);

Radio astronomy pipelines need: data_source(*params) • Maximal efficiency • High-throughput What about • Long deployments productivity? function1(*params) Arranging this should be simple! function2(*params) Why does it need to be immensely data_sink(*params) complicated? 5/9/17 Miles Cranmer 9

Rings and Blocks 5/9/17 Miles Cranmer 10

Exhibit A • Want to do: file read -> GPU STFT -> file write • What comes most naturally? • Functions applied to results of other functions… • So… make that the API 5/9/17 Miles Cranmer 11

5/9/17 Miles Cranmer 12

Create a block object which reads in data at a certain rate Modify block, chunk up the time series 5/9/17 Miles Cranmer 13

Implicitly pass ring buffer within block to input ring of next block Move around axes with labels Convert data type and write to disk 5/9/17 Miles Cranmer 14

(start threads) 5/9/17 Miles Cranmer 15

What did we lose? Some overhead to Python What did we save? Our sanity, our time, etc. 5/9/17 Miles Cranmer 16

Exhibit B: Alice & Bob Two astronomers want to collaborate on an app. But… • Bob writes a dedispersion code in C/CUDA , Alice writes a harmonic sum code in NumPy/PyCUDA • Bob outputs volume units with barn-megaparsecs, Alice wants cubic furlongs • Both use different axes … They also don’t talk to each other… 5/9/17 Miles Cranmer 17

How do we make this unfortunate collaboration painless? modularity, modularity, modularity 5/9/17 Miles Cranmer 18

Modularity = Blocks and Metadata • Blocks are black box algorithms, input/output through rings • Ring headers describe data • Units • Axes • GPU/CPU • … • Blocks can’t see each other; fit together seamlessly 5/9/17 Miles Cranmer 19

Exhibit C: Target of Opportunity (ToO) 10 Gb/s telescope backend – must insert algorithms in pipeline for ToO observation – need to do it in 20 minutes! Need to: 1. Average data 2. Matrix multiply 3. Element-wise square And it all has to be on the GPU ! 5/9/17 Miles Cranmer 20

Bifrost = Pipeline Framework, Block Library Accumulate: Matrix multiply (with a constant): CUDA JIT compiler (inside a user-defined block): 5/9/17 Miles Cranmer 21

Bifrost: Deployed on LWA SV (telescope) • 34 MHz live stream at LWA TV Channel 2: • phys.unm.edu/~lwa/lwatv2.html • (Or, Google: “LWA TV”, click first link, go to channel 2 ) 5/9/17 Miles Cranmer 22

Questions? • GitHub: /ledatelescope/bifrost • Paper in prep, to be submitted to JAI • LWA 2 live stream: phys.unm.edu/~lwa/lwatv2.html 5/9/17 Miles Cranmer 23

Bifrost Easy High-Throughput Computing - PowerPoint PPT Presentation

Bifrost Easy High-Throughput Computing github.com/ledatelescope/bifrost Miles Cranmer (Harvard/McGill), with: Ben Barsdell (NVIDIA), Danny Price (Berkeley), Hugh Garsden (Harvard), Gregory Taylor (UNM), Jayce Dowell (UNM), Frank Schinzel

Bifrost: Easy GPU Pipeline Development github.com/ledatelescope/bifrost Presenter: Miles

BIFROST HIGH-THROUGHPUT CPU/GPU PIPELINES MADE EASY Ben Barsdell, 4/7/2016 DISAMBIGUATION The

February 2003 FIRST Technical Colloquium February 10-11, 2003 @ Uppsala, Sweden bifrost a high

Panfrost A reverse engineered FOSS driver for Mali Midgard and Bifrost GPUs Contributors

CASPER AND GPUS MODERATOR: DANNY PRICE, SCRIBE: RICHARD PRESTAGE Frameworks MPI,

Simulations of the Impact of Partial Ionization on the Chromosphere Juan Martnez-Sykora Bart

Bench Top VDES Prototype - Software Defined Radio Team 2028 Bridget Kennedy Brittany Smith

Ohio Department of Transportation

Ribav Integration within Ripflow v.3 By: Joaquin Real Technical University of Valencia-Spain

Operating Budget Adoption May 28, 2020 FY20 Amended & FY21 Adopted The School District of

More on String and File Processing Marquette University Problems with Line Endings ASCII

Computational Tools for Data Science 02807, E 2018 Filtering Streams Paul Fischer Institut for

For All C or All Code ode, , Ther here Exist P e Exist Proper operties t ties to be Check o

The R e Role of e of T Trustwort rthy D Digi gital Rep epositori ries i in Sustainability

Introduction to Internationalized Domain Names (IDN) IP Symposium for CEE, CIS and Baltic States

FOSS4G, FOSS4G , Seoul Seoul September September 16, 2015 16, 2015 1 Prisk Priska a

V6.0 Connecting, Building and Growing Your Seismic Processing Capability A Scalable Seismic

The Challenge of Connected Data Dr. Jim Webber Chief Scientist, Neo Technology @jimwebber

A Tool for Packaging and Exchanging Simulation Results Dragan Savi [e-mail:

Chapter 6 : Computer Science Class XI ( As per CBSE Board) Python Fundamentals New Syllabus

Discord Bot CHRIS L Discord What is it? Why does it need bots? Existing bots Why

Towards Implementing Semantic Literature-Based Discovery with a Graph Database E-mail:

Object Oriented Programming in Action Object Oriented Analysis and Design Find and define

IODEF Data Model Status (changes from 02 to 03) <draft-ietf-inch-iodef-03> tracked @

Bifrost Easy High-Throughput Computing - PowerPoint PPT Presentation

Bifrost Easy High-Throughput Computing github.com/ledatelescope/bifrost Miles Cranmer (Harvard/McGill), with: Ben Barsdell (NVIDIA), Danny Price (Berkeley), Hugh Garsden (Harvard), Gregory Taylor (UNM), Jayce Dowell (UNM), Frank Schinzel

Bifrost: Easy GPU Pipeline Development github.com/ledatelescope/bifrost Presenter: Miles

BIFROST HIGH-THROUGHPUT CPU/GPU PIPELINES MADE EASY Ben Barsdell, 4/7/2016 DISAMBIGUATION The

February 2003 FIRST Technical Colloquium February 10-11, 2003 @ Uppsala, Sweden bifrost a high

Panfrost A reverse engineered FOSS driver for Mali Midgard and Bifrost GPUs Contributors

CASPER AND GPUS MODERATOR: DANNY PRICE, SCRIBE: RICHARD PRESTAGE Frameworks MPI,

Simulations of the Impact of Partial Ionization on the Chromosphere Juan Martnez-Sykora Bart

Bench Top VDES Prototype - Software Defined Radio Team 2028 Bridget Kennedy Brittany Smith

Ohio Department of Transportation

Ribav Integration within Ripflow v.3 By: Joaquin Real Technical University of Valencia-Spain

Operating Budget Adoption May 28, 2020 FY20 Amended &amp; FY21 Adopted The School District of

More on String and File Processing Marquette University Problems with Line Endings ASCII

Computational Tools for Data Science 02807, E 2018 Filtering Streams Paul Fischer Institut for

For All C or All Code ode, , Ther here Exist P e Exist Proper operties t ties to be Check o

The R e Role of e of T Trustwort rthy D Digi gital Rep epositori ries i in Sustainability

Introduction to Internationalized Domain Names (IDN) IP Symposium for CEE, CIS and Baltic States

FOSS4G, FOSS4G , Seoul Seoul September September 16, 2015 16, 2015 1 Prisk Priska a

V6.0 Connecting, Building and Growing Your Seismic Processing Capability A Scalable Seismic

The Challenge of Connected Data Dr. Jim Webber Chief Scientist, Neo Technology @jimwebber

A Tool for Packaging and Exchanging Simulation Results Dragan Savi [e-mail:

Chapter 6 : Computer Science Class XI ( As per CBSE Board) Python Fundamentals New Syllabus

Discord Bot CHRIS L Discord What is it? Why does it need bots? Existing bots Why

Towards Implementing Semantic Literature-Based Discovery with a Graph Database E-mail:

Object Oriented Programming in Action Object Oriented Analysis and Design Find and define

IODEF Data Model Status (changes from 02 to 03) &lt;draft-ietf-inch-iodef-03&gt; tracked @

Operating Budget Adoption May 28, 2020 FY20 Amended & FY21 Adopted The School District of

IODEF Data Model Status (changes from 02 to 03) <draft-ietf-inch-iodef-03> tracked @