Bifrost: Easy GPU Pipeline Development - - PowerPoint PPT Presentation

bifrost easy gpu pipeline development
SMART_READER_LITE
LIVE PREVIEW

Bifrost: Easy GPU Pipeline Development - - PowerPoint PPT Presentation

Bifrost: Easy GPU Pipeline Development github.com/ledatelescope/bifrost Presenter: Miles Cranmer (CfA/McGill) On behalf of: Ben Barsdell (NVIDIA), Danny Price (Berkeley), Jayce Dowell (UNM), Hugh Garsden (CfA), Frank Schinzel (NRAO),


slide-1
SLIDE 1

8/14/17 Miles Cranmer 1

Bifrost: Easy GPU Pipeline Development

github.com/ledatelescope/bifrost

  • Presenter: Miles Cranmer (CfA/McGill)
  • On behalf of: Ben Barsdell (NVIDIA), Danny Price

(Berkeley), Jayce Dowell (UNM), Hugh Garsden (CfA), Frank Schinzel (NRAO), Greg T aylor (UNM), Lincoln Greenhill (CfA)

slide-2
SLIDE 2

Stream-processing and real-time GPU computing

  • Stream-processing: operating on data which is

potentially unlimited in extent

  • E.g., time stream of digitized voltages
  • Nontrivial for CPU/GPU systems:
  • Creation of data structures for bufger memory management,

packet capture

  • Additional complexities for asynchronous copies and kernel

execution

  • Manual parallelization/core binding of algorithms and pipelines
  • Potential issues include memory leaks and race conditions

8/14/17 Miles Cranmer 2

slide-3
SLIDE 3

Bifrost is deployed in the wild:

  • Backend for newest LWA

station in NM

  • Bifrost-powered data

capture for live all-sky image

  • Google: “LWA TV 2”
  • Pulsar detection:
  • Validation timing within

0.0001 ms of canonical for PSR B0834+06 (well within 1σ of measurement)

8/14/17 Miles Cranmer 3

slide-4
SLIDE 4

Bifrost core concepts

  • Blocks
  • Independent thread
  • “Black box” algorithm
  • Ring bufgers (Rings)
  • Emulates wrap-around

in memory

  • Memory spaces
  • Rings assigned to

specifjc “space”

  • Pipelines
  • Combination of the

above

8/14/17 Miles Cranmer 4

slide-5
SLIDE 5

The Bifrost framework

  • Python frontend wraps fast C/C++/CUDA backend
  • Frontend:
  • Blocks and Pipelines are Python object abstractions for the

backend

  • ND-array object for memory management (span of ring bufger)
  • ctypes wraps all C calls
  • Backend:
  • Common type defjnitions and “BFarray” generic data structure
  • “Ring bufger” used for inter-block communication
  • Several common modules implemented

8/14/17 Miles Cranmer 5

slide-6
SLIDE 6

Ring Bufger implementation

  • Multiple readers, single writer ⇒ branched pipelines OK
  • Thread safe
  • Allocated in system (CPU), cuda (GPU), or cuda_host (pinned CPU)

memory

  • What’s unique?

8/14/17 Miles Cranmer 6

slide-7
SLIDE 7

API example 1: block

8/14/17 Miles Cranmer 7

class QuantizeBlock(TransformBlock): def __init__(self, iring, dtype, scale=1., *args, **kwargs): TransformBlock.__init__(self, iring, *args, **kwargs) self.dtype = dtype self.scale = scale def on_sequence(self, isequence):

  • hdr = deepcopy(isequence.header)
  • hdr['_tensor']['dtype'] = self.dtype

return ohdr def on_data(self, ispan, ospan): bf.quantize.quantize(ispan.data, ospan.data, self.scale)

slide-8
SLIDE 8

API example 2: pipeline

Read in fjle Copy to GPU FFT Square modulus Transpose Copy back to CPU Convert to 8-bit integer Save Run the pipeline

8/14/17 Miles Cranmer 8

bc = bf.BlockChainer() bc.blocks.read_wav(['audio_file.wav'], gulp_nframe=4096) bc.blocks.copy(space='cuda') bc.views.split_axis('time', 256, label='fine_time') bc.blocks.fft(axes='fine_time', axis_labels='freq') bc.blocks.detect(mode='scalar') bc.blocks.transpose(['time', 'pol', 'freq']) bc.blocks.copy(space='cuda_host') bc.blocks.quantize('i8') bc.blocks.write_sigproc() pipeline = bf.get_default_pipeline() pipeline.shutdown_on_signals() pipeline.run()

slide-9
SLIDE 9

bf.map

  • Easy CUDA kernel generation from Bifrost
  • JIT compiler uses NVRTC

8/14/17 Miles Cranmer 9

# Create three arrays on the GPU, A and B, and an empty output C a = bf.ndarray([1,2,3,4,5], space='cuda') b = bf.ndarray([1,0,1,0,1], space='cuda') c = bf.empty(5, space='cuda') # Add A, B together bf.map("c = a + b", data={'c': c, 'a': a, 'b': b})

slide-10
SLIDE 10

bf.map

Explicit indexing also supported. Outer product:

8/14/17 Miles Cranmer 10

bf.map("c(i,j) = a(i) * b(j)", {'c': c, 'a': a, 'b': b}, axis_names=('i','j'))

slide-11
SLIDE 11

Why Bifrost?

8/14/17 Miles Cranmer 11

slide-12
SLIDE 12

Astronomy-specifjc

  • Bifrost developed in parallel with LWA-SV, driven by radio astronomy

applications

  • ⇒ Core structural advantages for astronomy
  • Ring features
  • Metadata describes the units of ring bufger dimensions; used in algorithms

(e.g., dedispersion)

  • Multi-sequence ring bufgers, useful for difgerent observations. The metadata

will propagate down the pipeline.

  • Time-tagged sequences in ring bufgers ⇒ can dump section of data to disk

based on time range, observation name

  • Useful for detections of transient phenomena
  • Ndarray is a child of numpy.ndarray ⇒ compatibility with many numpy

functions, matplotlib, etc.

Why Bifrost?

8/14/17 Miles Cranmer 12

slide-13
SLIDE 13

Block library

Many astronomy and general processing blocks already built

  • State of the art and fmexible high-performance implementations
  • Metadata rich
  • Well-documented
  • Flexible dimensions

These include:

Why Bifrost?

  • accumulate
  • audio
  • binary_io
  • detect
  • fdmt
  • fft
  • fftshift
  • guppi_raw
  • quantize
  • reduce
  • reverse
  • serialize
  • sigproc
  • transpose
  • unpack
  • wav

8/14/17 Miles Cranmer 13

slide-14
SLIDE 14

Logging and performance benchmarking

Why Bifrost?

  • getirq
  • getsiblings
  • like_bmon
  • like_ps
  • like_top
  • pipeline2dot
  • setirq

8/14/17 Miles Cranmer 14

slide-15
SLIDE 15

Rapid development speed; high performance

Why Bifrost?

Bifrost code vs. C++ legacy:

8/14/17 Miles Cranmer 15

slide-16
SLIDE 16

Rapid development speed; high performance

Why Bifrost?

8/14/17 Miles Cranmer 16

slide-17
SLIDE 17

Rapid development speed; high performance

Why Bifrost?

8/14/17 Miles Cranmer 17

slide-18
SLIDE 18

Conclusion

  • Future work
  • PSRDADA – Bifrost block
  • T
  • enable capture with PSRDADA to a Bifrost ring for post-processing
  • Additional options for visualization, "ScopeBlock”
  • Visualize ring contents in real-time
  • Aiming for full support of correlation, pulsar/transient backend pipelines

github.com/ledatelescope/bifrost

(or, Google: “leda telescope bifrost”)

8/14/17 Miles Cranmer 18