C oprocessor A ccelerated F ilterbank Extension Library Mummy, are - PowerPoint PPT Presentation

C oprocessor A ccelerated F ilterbank Extension Library Mummy, are we there yet Jan Kr¨ amer DLR Institute of Communication and Navigation (IKN) 04.02.2018

Overview Introduction Arbitrary Resampler Transition to the GPU Open Sourcing Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 2 / 23

Introduction Who am I? Jan Kr¨ amer Software Defined Radio Imposter at German Aerospace Centre Oberpfaffenhofen General interest in making stuff a bit faster Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 3 / 23

Introduction I fought my own officemate for rights to that name... CAFE is the C oprocessor A ccelerated F ilterbank E xtensions Library Realtime Polyphase Filterbank Channelizer (PFB-C) 45 channels 1550 tap filter 4 MSamples/s needed Optimized CPU Version: 1-2 MSamples/s Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 4 / 23

Introduction Regular ordinary frametitle, no memes here GPGPU TO THE RESCUE!!! Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 5 / 23

Introduction Yo check me out, I’m awesome ◮ Channelizer presented already last year 1 ◮ Oversamples the output to all factors that are integer divisions of the channel number (e.g. 3x oversampled = 45 channels/15) ◮ Able to achieve 110 MSamples/s (45 Channels, 1550 tap protoype filter) ◮ Now does CuFFT output reshuffle → additional performance gains are expected Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 6 / 23

Introduction Who wrote those specs... ◮ Timing sync needs 4x oversampling factor ◮ PFB-C gets to 4.2666x oversampling factor ◮ Arbitrary resampler needed Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 7 / 23

Arbitrary Resampler Bloody Resamplers, how do they work? ◮ Use PFB to ”upsample” the signal ◮ Downsample by skipping the right filters in the bank ◮ Filter the signal with normal filter and a differential filter in parallel ◮ Interpolate between the 2 outcomes of the filter ◮ Profit Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 8 / 23

Arbitrary Resampler I wish I had a mouse to draw this... Start with normal vector of taps Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 9 / 23

Arbitrary Resampler Halp...this is LibreOffice Draw Add the differential tap vector diff tap [ i ] = tap [ i + 1] − tap [ i ] (1) Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 9 / 23

Usual partitioning is applied...Oh god I suck at graphics

Arbitrary Resampler Breakdown of operations ◮ interpolation rate = How much to upsample ◮ decimation rate = How much to downsample ◮ floating rate = Difference between the integer downsampling and the actual needed downsampling factor ◮ accumulated rate = Accumulated difference between the integer filter skips and needed filter skips Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 10 / 23

Arbitrary Resampler Did you notice the last 2 frametitles made sense? ◮ interpolation rate = number of filter (2) ◮ decimation rate = floor ( interpolation rate / rate ) (3) ◮ floating rate = ( interpolation rate / rate ) − decimation rate (4) ◮ accumulated rate in 2 steps: ◮ accumulated rate += floating rate (5) ◮ accumulated rate = accumulated rate % 1 . 0 (6) Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 11 / 23

Arbitrary Resampler I hope you rembered those equation numbers! Filterskips and interpolation ◮ Calculate ouput normal and output diff of both filters at filter index ◮ result = output normal + accumulated rate ∗ output diff (7) (Interpolation) ◮ Update accumulated rate according to [5] ◮ Update filter index += decimation rate + floor(accumulated rate) (8) ◮ Update accumulated rate according to [6] ◮ Update input = input + filter index/interpolation rate (9) Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 12 / 23

Transition to the GPU You hear the music, don’t you? Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 13 / 23

Transition to the GPU One slide, sure... CUDA in one slide: ◮ Used to launch operations in massively parallel fashion on the GPU ◮ Closely related to NVidia GPU architecture ◮ Several multiprocessors each with local on-chip memory and cache (fast) ◮ Several CUDA Cores/ALUs per multiprocessor ◮ Large (but slow) Global memory Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 14 / 23

Transition to the GPU Told you it won’t work CUDA in one several slides: ◮ CUDA divides operations into a grid of blocks ◮ Maps: ◮ Grid ⇒ GPU ◮ Block ⇒ Multiprocessor ◮ Thread ⇒ ALU ◮ Threads are scheduled in groups of 32 ⇒ Warps ◮ All Threads in a block can use shared, fast on-chip memory Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 15 / 23

Transition to the GPU As it is written in the sacred NVIDIA optimization guide CUDA rules of thumb ◮ More threads than your Multiprocessor has ALUs ⇒ keeps huge pipeline busy ◮ On-Chip memory waaaay faster than Global memory ◮ Loads from both memories are done with a huge cacheline ⇒ have adjacent threads in a warp use adjacent memory entries ⇒ minimizes memory loads Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 16 / 23

Transition to the GPU Where have I heard this before... ◮ Target outputs of the PFB Channelizer ⇒ Maximum use of the available cores ◮ One channel mapped to one CUDA block ◮ Each thread computes one resampler output ◮ Each thread computes both filter results and interpolation ◮ Concurrency only through processing of multiple samples ⇒ minimal synchronization needed ◮ Same division as the PFB Channelizer Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 17 / 23

Transition to the GPU Prayers to the floating point god Filter calculations ◮ All filter updates calculated on the GPU ◮ Filter processes all samples in its input ◮ Uncertainty in produced outputsamples ◮ Precalculate the number of operations on the CPU ◮ Transfer expected end filter and number of ops to the GPU before every run ◮ Dummy calculations might be done by a Warp ⇒ take care of it when copying data back from the GPU Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 18 / 23

Transition to the GPU Just imagine a fancy graphic Results look promising for our use case ◮ Software runs on Intel i7-6800k with NVidia GTX970 GPU ◮ Benchmarked the full chain PFB Channelizer + PFB Resampler ◮ 45 Channels + 1550 taps protoype filter used ◮ 768 samples per channel processed in parallel ◮ Result ⇒ 25 MSamples/s average throughput Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 19 / 23

Open Sourcing Call me Don Quijote Harti (awesome colleague) and I battling since september to get it open sourced Established an open sourcing process at IKN with me as the lab rat ◮ Check licenses ◮ Check export control ◮ Check with project partners and project sponsor/coordinator ◮ Establish CLA Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 20 / 23

Open Sourcing What an excuse for this subpar presentation ◮ Still had to convince the institute management ◮ Several presentations on how open source benefits everyone (DLR and you gals and guys) ◮ Several written documents basically claiming the same as the presentations ◮ The whole project (and this talk) was in jeopardy Finally on monday we got the greenlight 1 hour before I went on vacation... Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 21 / 23

Open Sourcing Thanks Obama Special thanks to these people at IKN Gianluigi Liva group leader for the information transmission group at DLR Institute of Communication and Navigation (DLR IKN) Hartmut ”Harti” Brandt lead developer at the satellite communication group at DLR IKN Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 22 / 23

Open Sourcing Thanks Obama Even more special thanks to Joni Gerald For all the Kung Fury inspiration!! Jan Kr¨ amer IKN C oprocessor A ccelerated F ilterbank Extension Library 04.02.2018 23 / 23

C oprocessor A ccelerated F ilterbank Extension Library Mummy, are - PowerPoint PPT Presentation

C oprocessor A ccelerated F ilterbank Extension Library Mummy, are we there yet Jan Kr amer DLR Institute of Communication and Navigation (IKN) 04.02.2018 Overview Introduction Arbitrary Resampler Transition to the GPU Open Sourcing Jan

Instruction Set Architecture C P U C oprocessor 1 (FP U ) of R egiste rs R egisters $ 0 $0

Emergency Departments: Outcomes of the Queensland A ccelerated C hest pain R isk E valuation (ACRE)

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

Improving User Experience for translators Translate Extension Translate Extension Translate

The Homeschooling - Library Connection Diane Pamel- Library Director Southworth Library and

Eric Lashley Library Director, Georgetown Public Library (TX) Patrick Lloyd, LMSW Community

Overview of Cooperative Extension Laura Perry Johnson Associate Dean for Extension University

Partnership for TCS & Davidson County Public Library TCS & Davidson County Public Library

Library RMR Project Renovate, Modernize, Reorganize Library Serves Patrons of Every Age This

King Fahd University of Petroleum & Minerals Deanship of Library Affairs KFUPM Library

PopUp Library @ Senior Center Whats a PopUp Library? Library services somewhere that is

What do you do with the temporarily placed programs? The problem is more widespread than just

The Chester Community Library The Chester Community Library The Chester Community Library The

Wolfner Talking Book and Braille Library That All May Read Wolfner Library About Wolfner

A sound curation in musical instrument conservation Gea O.F . Parikesit, Nicole A. Tse, Rong Wei

Practicing Law with Humility (Part Deux) How can Indigenous Laws make Canadian Lawyers better?

Multi Cycle CPU Jason Mars Monday, February 4, 13 Why a Multiple Cycle CPU? Monday, February 4,

COMP 110-003 Introduction to Programming Computer Basics January 10, 2013 Haohan Li TR 11:00

FUTURE INTERNET Testbed @TWAREN Che-Nan Yang NCHC,Taiwan Overview OpenFlow Testbed in

Linear Predictive Coding and Cepstrum coefficients for mining time variant information from

Pattern Recognition Part 4: Feature Extraction Gerhard Schmidt Christian-Albrechts-Universitt

AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on

C oprocessor A ccelerated F ilterbank Extension Library Mummy, are - PowerPoint PPT Presentation

C oprocessor A ccelerated F ilterbank Extension Library Mummy, are we there yet Jan Kr amer DLR Institute of Communication and Navigation (IKN) 04.02.2018 Overview Introduction Arbitrary Resampler Transition to the GPU Open Sourcing Jan

Instruction Set Architecture C P U C oprocessor 1 (FP U ) of R egiste rs R egisters $ 0 $0

Emergency Departments: Outcomes of the Queensland A ccelerated C hest pain R isk E valuation (ACRE)

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

Improving User Experience for translators Translate Extension Translate Extension Translate

The Homeschooling - Library Connection Diane Pamel- Library Director Southworth Library and

Eric Lashley Library Director, Georgetown Public Library (TX) Patrick Lloyd, LMSW Community

Overview of Cooperative Extension Laura Perry Johnson Associate Dean for Extension University

Partnership for TCS &amp; Davidson County Public Library TCS &amp; Davidson County Public Library

Library RMR Project Renovate, Modernize, Reorganize Library Serves Patrons of Every Age This

King Fahd University of Petroleum &amp; Minerals Deanship of Library Affairs KFUPM Library

PopUp Library @ Senior Center Whats a PopUp Library? Library services somewhere that is

What do you do with the temporarily placed programs? The problem is more widespread than just

The Chester Community Library The Chester Community Library The Chester Community Library The

Wolfner Talking Book and Braille Library That All May Read Wolfner Library About Wolfner

A sound curation in musical instrument conservation Gea O.F . Parikesit, Nicole A. Tse, Rong Wei

Practicing Law with Humility (Part Deux) How can Indigenous Laws make Canadian Lawyers better?

Multi Cycle CPU Jason Mars Monday, February 4, 13 Why a Multiple Cycle CPU? Monday, February 4,

COMP 110-003 Introduction to Programming Computer Basics January 10, 2013 Haohan Li TR 11:00

FUTURE INTERNET Testbed @TWAREN Che-Nan Yang NCHC,Taiwan Overview OpenFlow Testbed in

Linear Predictive Coding and Cepstrum coefficients for mining time variant information from

Pattern Recognition Part 4: Feature Extraction Gerhard Schmidt Christian-Albrechts-Universitt

AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on

Partnership for TCS & Davidson County Public Library TCS & Davidson County Public Library

King Fahd University of Petroleum & Minerals Deanship of Library Affairs KFUPM Library