Recursive Filter Code for GPUs Sepideh Maleki and Martin Burtscher - PowerPoint PPT Presentation

Automatic Generation of 1D Recursive Filter Code for GPUs Sepideh Maleki and Martin Burtscher

Based on Fibonacci Sequences ▪ Fibonacci numbers: 0, 1 , 1, 2, 3, 5, 8, 13, 21, … ▪ Sum of previous two values ( F i = F i −1 + F i −2 ) ▪ Tribonacci numbers: 0, 0, 1 , 1, 2, 4, 7, 13, 24, … ▪ Sum of prior three values ( F i = F i −1 + F i −2 + F i −3 ) http://www.storyofmathematics.com/medieval_fibonacci.html ▪ (2, -3, 1)-Fibonacci numbers: 0, 0, 1, 2, 1, -3, -7, - 4, … ▪ Weighted sum of prior values ( F i = 2 F i −1 - 3 F i −2 + 1 F i −2 ) ▪ ( w 1 ,…, w k )-Fibonacci numbers: 0 , …, 0, 1, w 1 , w 1 2 + w 2 , … ▪ Weighted sum of prior k values with w j ∈ ℝ ( F i = w 1 F i −1 + w 2 F i −2 + … + w k F i −k ), called k -nacci numbers Automatic Generation of 1D Recursive Filter Code for GPUs 2

Linear Recurrences ▪ Transform input sequence into output sequence x 0 , …, x n -1 → y 0 , …, y n -1 ▪ Our focus is on order- k homogeneous linear recurrences with constant coefficients y i = a 0 x i + a -1 x i -1 +…+ a -p x i-p + b -1 y i -1 + b -2 y i -2 +…+ b -k y i-k Automatic Generation of 1D Recursive Filter Code for GPUs 3

Importance of Linear Recurrences ▪ Linear recurrences appear in many domains ▪ Mathematics ▪ Random-number gen. ▪ Data compression ▪ Finance and economics ▪ Biology ▪ Complexity analysis ▪ Parallel programming ▪ Prefix sums ▪ Telecommunication ▪ Digital filters gamedsforum.ca Automatic Generation of 1D Recursive Filter Code for GPUs 4

Prefix Sums ▪ Prefix sums are fundamental building blocks ▪ Help parallelize many seemingly serial algorithms ▪ Given a sequence of values (integer or real) 3 2 -1 8 -6 1 -9 5 ▪ Compute the sequence whose values are the sum of all previous values from the original sequence 3 5 4 12 6 7 -2 3 y i = x i + y i -1 Automatic Generation of 1D Recursive Filter Code for GPUs 5

Digital (Recursive) Filters ▪ IIR filters are fundamental DSP algorithms ▪ Used in telecommunication and audio DSP codes ▪ Digital equivalent to analog RC circuits ▪ Illustration ▪ High-pass filter y i = 0.93 x i - 0.93 x i -1 + 0.86 y i -1 The Scientist and Engineer’s Guide to Digital Signal Processing by Steven W. Smith Automatic Generation of 1D Recursive Filter Code for GPUs 6

Parallelization Difficulty ▪ Recurrence equation ( x j = 0, y j = 0, ∀ j < 0) y i = a 0 x i + a -1 x i -1 +…+ a -p x i-p + b -1 y i -1 + b -2 y i -2 +…+ b -k y i-k ▪ Computation of element y i Input sequence: … … x i-p … x i-1 x i x i+1 … given, read-only a -1 a 0 a -p The a j are the non-recursion (feed-forward) coefficients k denotes the order ∑ of the recurrence The b j are the recursion (feed-back) coefficients b -k b -2 b -1 Output sequence: … y i-k … y i-2 y i-1 y i y i+1 … written and read Data dependency! Automatic Generation of 1D Recursive Filter Code for GPUs 7

Simplified Notation ▪ Recurrence equation y i = a 0 x i + a -1 x i -1 +…+ a -p x i-p + b -1 y i -1 + b -2 y i -2 +…+ b -k y i-k ( a 0 , a -1 , …, a -p : b -1 , b -2 , …, b -k ) ▪ Signature ▪ Lists only non-recursion and recursion coefficients in parentheses and separated by a colon Automatic Generation of 1D Recursive Filter Code for GPUs 8

Signature Examples ▪ Standard prefix sum ▪ Prefix sum over scalar values 3 2 -1 8 -6 1 -9 5 3 5 4 12 6 7 -2 3 ▪ (1 : 1) ▪ Low-pass digital filters ▪ Retain low frequencies but dampen high frequencies ▪ 1-stage (0.2 : 0.8), 2-stage (0.04 : 1.6, -0.64), etc. ▪ High-pass digital filters by Steven W. Smith ▪ Retain high frequencies ▪ 1-stage (0.9, -0.9 : 0.8) ▪ 2-stage (0.81, -1.62, 0.81 : 1.6, -0.64) Automatic Generation of 1D Recursive Filter Code for GPUs 9

Separation into Map + Recurrence ▪ Original recurrence y i = a 0 x i + a -1 x i -1 +…+ a -p x i-p + b -1 y i -1 + b -2 y i -2 +…+ b -k y i-k ▪ Equivalent map and simpler recurrence ▪ Map operation t i = a 0 x i + a -1 x i -1 +…+ a -p x i-p ( a 0 , …, a -p : 0) ▪ Recurrence y i = t i + b -1 y i -1 + b -2 y i -2 +…+ b -k y i-k (1 : b -1 , …, b -k ) ▪ Benefit: easier to parallelize ▪ Recurrence always has (1 : ...) format; map is trivial Automatic Generation of 1D Recursive Filter Code for GPUs 10

Our PLR Approach ▪ High-level idea 3 2 -1 8 -6 1 -9 5 4 -1 5 -8 ▪ Break input into chunks of size 1 (trivial) 3 2 -1 8 -6 1 -9 5 4 -1 5 -8 ▪ Iteratively combine adjacent chunks into larger chunks 3 5 -1 7 -6 -5 -9 -4 4 3 5 -3 ▪ Two phases 3 5 4 12 -6 -5 -14 -9 4 3 8 0 Merging 1. 6 7 -2 3 Pipelining* 2. 7 6 11 3 Automatic Generation of 1D Recursive Filter Code for GPUs 11

PLR Merging (1 : d ) 3 2 -1 8 ▪ Merging two adjacent chunks 3 5 -1 7 ▪ v 0 , v 1 , …, v m-1 | v m , v m+1 , …, v 2m-1 ▪ Correcting element v m 3 5 4 12 ▪ Per (1 : d ), need to add d times prior element v m-1 ▪ The correction term is d ∙ v m-1 ▪ Correcting element v m+1 ▪ Need to add d times the corrected prior element ▪ Already added d times v m in an earlier iteration ▪ Only need to add d times the prior correction term ▪ The correction term is d ∙ d ∙ v m-1 Automatic Generation of 1D Recursive Filter Code for GPUs 12

PLR Merging (1 : d ) cont. ▪ Correcting elements v m+2 , v m+3 , etc. ▪ The correction terms are d 3 ∙ v m-1 , d 4 ∙ v m-1 , etc. ▪ Correction factor times carry v m-1 from prior chunk ▪ Key observation ▪ Carry value depends on input sequence ▪ Correction factors only depend on recurrence ▪ Can be precomputed as they are the same for all inputs ▪ Just the correction factors Start with 1, apply recurrence (0 : d) All factors are 1 for d = 1; → 1 | d , d 2 , d 3 , …, d m ▪ d , d 2 , d 3 , …, d m prefix sum is trivial base case Automatic Generation of 1D Recursive Filter Code for GPUs 13

PLR Merging (1 : d , e ) ▪ Merging two adjacent chunks ▪ v 0 , v 1 , …, v m-2 , v m-1 | v m , v m+1 , …, v 2m-1 ▪ Correcting element v m ▪ Per (1 : d, e ), need to add d times v m-1 plus e times v m-2 ▪ The correction term is d ∙ v m-1 + e ∙ v m-2 ▪ Correcting element v m+1 ▪ Need to add d times (d ∙ v m-1 + e ∙ v m-2 ) plus e times v m-1 ▪ The correction term is d∙(d∙v m-1 +e∙v m-2 ) + e∙v m-1 , which is (d 2 +e) ∙ v m-1 + ( d∙e ) ∙ v m-2 after rearranging the terms Automatic Generation of 1D Recursive Filter Code for GPUs 14

PLR Merging (1 : d , e ) cont. ▪ Correcting elements v m+2 , v m+3 , etc. ▪ The correction terms are (d 3 +2de) ∙ v m-1 + (d 2 e+e 2 ) ∙ v m-2 , (d 4 +3d 2 e+e 2 ) ∙ v m-1 + (d 3 e+2de 2 ) ∙ v m-2 , etc. ▪ There are two carries v m-1 and v m-2 from prior chunk ▪ Because the recurrence (1 : d , e ) has order 2 ▪ Just the correction factors for v m-1 ▪ d, d 2 +e, d 3 +2de, d 4 +3d 2 e+e 2 , … ▪ Just the correction factors for v m-2 ▪ e, de, d 2 e+e 2 , d 3 e+2de 2 , … Automatic Generation of 1D Recursive Filter Code for GPUs 15

PLR Merging (1 : d , e ) cont. ▪ Correction factors for v m-1 ▪ d, d 2 +e, d 3 +2de, d 4 +3d 2 e+e 2 , … ▪ Correction factors for v m-2 ▪ e, de, d 2 e+e 2 , d 3 e+2de 2 , … ▪ Both sequences can be generated by (0 : d , e ) ▪ 0, 1 | d, d 2 +e, d 3 +2de, d 4 +3d 2 e+e 2 , … ▪ 1, 0 | e, de, d 2 e+e 2 , d 3 e+2de 2 , … The “1” indicates the location of the carry in the prior chunk Automatic Generation of 1D Recursive Filter Code for GPUs 16

PLR Merging (1 : b -1 , b -2 , …, b -k ) ▪ Correction-factor computation ▪ Recurrence has order k , so k lists of factors needed ▪ Start with k -1 zeros and a one: 0 , …, 0, 1, 0 , …, 0 ▪ “ 1 ” is in location of corresponding carry ▪ Compute factors using (0 : b -1 , b -2 , …, b -k ) Correction factors are k -nacci sequences (generalized Fibonacci sequences) Automatic Generation of 1D Recursive Filter Code for GPUs 17

PLR: Proof of Concept Tool ▪ PLR code generator ▪ Compiles signature into CUDA code for GPUs ▪ Performs domain-specific code optimizations ▪ Generated code ▪ Performs map operation ( a 0 , a -1 , …, a -p : 0) ▪ Computes recurrence (1 : b -1 , b -2 , …, b -k ) ▪ First five merge steps are done at warp level ▪ Remaining merge steps are done at thread-block level ▪ Pipelining is performed at grid level* ▪ Uses m ≤ 9 ∙ 1024 for floats and m ≤ 11 ∙ 1024 for ints Automatic Generation of 1D Recursive Filter Code for GPUs 18

Experimental Methodology ▪ GPU ▪ GeForce GTX Titan X (1.1 GHz cores, 3.5 GHz memory) ▪ 3072 cores, 24 SMs, up to 49,152 active threads ▪ 2 MB L2 cache, 12 GB of global memory (336 GB/s) ▪ Compiler and flags ▪ nvcc 7.5 with “ -O3 - arch=sm_52” ▪ Comparison codes ▪ Prefix sums: CUB (Nvidia), SAM (us), Scan (CMU) ▪ Digital filters: Alg3 (IMPA), Rec (Halide/MIT), Scan ▪ All downloaded except Scan (uses CUB’s scan) Automatic Generation of 1D Recursive Filter Code for GPUs 19

Recursive Filter Code for GPUs Sepideh Maleki and Martin Burtscher - PowerPoint PPT Presentation

Automatic Generation of 1D Recursive Filter Code for GPUs Sepideh Maleki and Martin Burtscher Based on Fibonacci Sequences Fibonacci numbers: 0, 1 , 1, 2, 3, 5, 8, 13, 21, Sum of previous two values ( F i = F i 1 + F i 2 )

Recursive State Estimation 2 Lecture 8 Recap Today Kalman Filter Extended Kalman Filter

STUFF FILTER POPULARITY FILTER PERSONALITY FILTER TALENT FILTER GODS FILTER WE HAVE THE

Kalman filter Kalman Filter Kalman filter is used to filter true system states from noisy

61A Lecture 6 Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Recursive Methods Noter ch.2 Recursive Methods Recursive problem solution Problems

Recursion Announcements Recursive Functions Recursive Functions 4 Recursive Functions

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Filter Design Specifications Chaiwoot Boonyasiriwat September 29, 2020 Filter Design

THE REPO DOES NOT FORGET STEP 1: GIT FILTER-BRANCH git filter-branch --index-filter 'git rm -rf

Lesson 9 Recursive Types 2/19, 21 Chapters 20, 21 Recursive type Recursive type terms are

Recursive Methods Recursive problem solution Problems that are naturally solved by

PURE POWER FILTERS brand guarantee the high quality and performance I N D E X Introduction Air

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Assessing the Stability of Forecasting Models: Recursive Parameter Estimation and Recursive

Non-Recursive In-Place FFT Algorithm Idea: "Unwind the in-place recursive algorithm and work

Recursion Announcements Recursive Functions Recursive Functions Definition : A function is

E.V.D.S Emergency Vehicle Detection System Group 28 Ryan Chappell EE Daniel Christiano - EE

Chapter 3 Lesson Plan Module 7 Types of Radio Circuits The Basic Transceiver Combination

Key Findings SUNY POTSDAM OFFICE OF INSTITUTIONAL EFFECTIVENESS 1 National Survey of Student

Supporting Sophomore Student Success: Student- and Institution-Level Results from Two National

A Broadband Receiver for FAST Sander Weinreb, sweinreb@caltech.edu 1. Wideband feeds 2. Focal

to correctly timestamp a video Thomas Ouddeken Supervisor Niels den Otter Zeno Geradts What is

ARREL AUDIO ML-118 Mid-Side Unit Livio Argentini, Marco Re ARREL AUDIO Rome Via Arnoldo

INTERNSHIP SUMMER, 2018 KATHERINE GREY DECEMBER 12, 2018 ABOUT K&L MICROWAVE K&L is

Recursive Filter Code for GPUs Sepideh Maleki and Martin Burtscher - PowerPoint PPT Presentation

Automatic Generation of 1D Recursive Filter Code for GPUs Sepideh Maleki and Martin Burtscher Based on Fibonacci Sequences Fibonacci numbers: 0, 1 , 1, 2, 3, 5, 8, 13, 21, Sum of previous two values ( F i = F i 1 + F i 2 )

Recursive State Estimation 2 Lecture 8 Recap Today Kalman Filter Extended Kalman Filter

STUFF FILTER POPULARITY FILTER PERSONALITY FILTER TALENT FILTER GODS FILTER WE HAVE THE

Kalman filter Kalman Filter Kalman filter is used to filter true system states from noisy

61A Lecture 6 Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Recursive Methods Noter ch.2 Recursive Methods Recursive problem solution Problems

Recursion Announcements Recursive Functions Recursive Functions 4 Recursive Functions

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Filter Design Specifications Chaiwoot Boonyasiriwat September 29, 2020 Filter Design

THE REPO DOES NOT FORGET STEP 1: GIT FILTER-BRANCH git filter-branch --index-filter 'git rm -rf

Lesson 9 Recursive Types 2/19, 21 Chapters 20, 21 Recursive type Recursive type terms are

Recursive Methods Recursive problem solution Problems that are naturally solved by

PURE POWER FILTERS brand guarantee the high quality and performance I N D E X Introduction Air

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Assessing the Stability of Forecasting Models: Recursive Parameter Estimation and Recursive

Non-Recursive In-Place FFT Algorithm Idea: &quot;Unwind the in-place recursive algorithm and work

Recursion Announcements Recursive Functions Recursive Functions Definition : A function is

E.V.D.S Emergency Vehicle Detection System Group 28 Ryan Chappell EE Daniel Christiano - EE

Chapter 3 Lesson Plan Module 7 Types of Radio Circuits The Basic Transceiver Combination

Key Findings SUNY POTSDAM OFFICE OF INSTITUTIONAL EFFECTIVENESS 1 National Survey of Student

Supporting Sophomore Student Success: Student- and Institution-Level Results from Two National

A Broadband Receiver for FAST Sander Weinreb, sweinreb@caltech.edu 1. Wideband feeds 2. Focal

to correctly timestamp a video Thomas Ouddeken Supervisor Niels den Otter Zeno Geradts What is

ARREL AUDIO ML-118 Mid-Side Unit Livio Argentini, Marco Re ARREL AUDIO Rome Via Arnoldo

INTERNSHIP SUMMER, 2018 KATHERINE GREY DECEMBER 12, 2018 ABOUT K&amp;L MICROWAVE K&amp;L is

Non-Recursive In-Place FFT Algorithm Idea: "Unwind the in-place recursive algorithm and work

INTERNSHIP SUMMER, 2018 KATHERINE GREY DECEMBER 12, 2018 ABOUT K&L MICROWAVE K&L is