GPU Accelerated Machine Learning for Bond Price Prediction Venkat - PowerPoint PPT Presentation

GPU Accelerated Machine Learning for Bond Price Prediction Venkat Bala Rafael Nicolas Fermin Cota

Motivation Primary Goals • Demonstrate potential benefjts of using GPUs over CPUs for machine learning • Exploit inherent parallelism to improve model performance • Real world application using a bond trade dataset 1

Highlights Ensemble • Bagging : Train independent regressors on equal sized bags of samples • Generally, performance is superior to any single individual regressor • Scalable : Each individual model can be trained independently and in parallel Hardware Specifjcations • CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz • GPU: GeForce GTX 1080 Ti • RAM : 1 TB (DDR4 2400 MHZ) 2

Bond Trade Dataset Feature Set • 100+ features per trade • Trade Size/Historical Features • Coupon Rate/Time to Maturity • Bond Rating • Trade Type: Buy/Sell • Reporting Delays • Current Yield/Yield To Maturity Response • Trade Price 3

Modeling Approach

The Machine Learning Pipeline Accelerate each stage in the pipeline for maximum performance 4 CV/TEST SET DATA MODEL EVALUATE PROCESSING BUILDING TRAINING SET DEPLOY

Data Preprocessing Exposing Data Parallelism • Many models rely on input data being on the same scale • Standardization, log transformations, imputations, polynomial/non-linear feature generation, etc. • Most cases, no data dependence so each operation can be executed independently • Signifjcant speedups can be obtained using GPUs, given suffjcient data/computation 5 • Important stage in the pipeline ( Garbage In → Garbage out )

Data Preprocessing: Sequential Approach 6 Apply function F ( · ) sequentially to each element in a feature column F ( · ) . . . a 0 a 1 a 2 a 3 a N

Data Preprocessing: Parallel Approach 7 Apply function F ( · ) in parallel to each element in a feature column . . . a 0 a 1 a 2 a 3 a N F ( · ) F ( · ) F ( · ) F ( · ) F ( · ) . . . b 0 b 1 b 2 b 3 b N

Programming Details Implementation Basics • Task is embarrassingly parallel • Improve CPU code performance • Auto vectorizations + compiler optimizations • Using performance libraries (Intel MKL) • Adopting Threaded (OpenMP)/Distributed computing (MPI) approaches • Great application case for GPUs • Offmoad computations onto the GPU via CUDA kernels • Launch as many threads as there are data elements • Launch several kernels concurrently using CUDA streams 8

Toy Example: Speedup Over Sequential C++ • Log transformation of an array of fmoats 9 • N = 2 p , Number of elements, p = log 2 ( N ) 10 Vectorized C++ Speedup Over Sequential C++ CUDA 8 6 4 2 0 18 19 20 21 22 23 p

Bond Dataset Preprocessing Applied Transformations • Log transformation of highly skewed features (Trade Size, Time to Maturity) • Standardization (Trade Price & historical prices) • Missing value imputation • Winsorizing features to handle outliers • Feature generation (Price differences, Yield measurements) Implementation Details • CPU: C++ implementation using Intel MKL/Armadillo • GPU: CUDA 10

GPU Speedup over CPU implementation • Nearly 10x speedup obtained after CUDA optimizations 11 10 Unoptimized CUDA Optimized CUDA 8 Speedup over CPU 6 4 2 0 20 21 22 23 24 25 p

CUDA Optimizations Standard Tricks • Concurrent kernel executions of kernels using CUDA streams to maximizing GPU utilization • Use of optimized libraries such as cuBLAS/Thrust • Coalesced memory access • Maximizing memory bandwidth for low arithmetic intensive operations • Caching using GPU shared memory 12

Model Building

Ensemble Model Model Choices • GBT : XGBoost, DNN : Tensorfmow/Keras 13 ENSEMBLE MODEL GBT DNN MODELS

Hyperparameter Tuning: Hyperopt GBT: XGBoost • Learning Rate • Max depth • Minimum child weight • Subsample, Colsample-bytree • Regularization parameters DNN: MLPs • Learning Rate/Decay Rate • Batch Size • Epochs • Hidden layers/Layer width • Activations/Dropouts 14

Hyperparameters Tuning: Hyperopt 15 1 . 0 0 . 8 Learning Rate 0 . 6 0 . 4 0 . 2 0 . 0 0 200 400 600 800 1000 Iterations

XGBoost: Training & Hyperparameter Optimization Time 16 CPU GBT, Speedup ≈ 3x GPU Intel(R) Xeon(R) E5-2699, 32 cores GTX 1080 Ti 0 2 4 6 8 Avg. Training Time (H)

TensorFlow/Keras Time Per Epoch 17 18 17 Speedup ≈ 3 x p 16 GTX 1080 Ti 15 Intel(R) Xeon(R) E5-2699, 32 cores 0 . 00 0 . 05 0 . 10 0 . 15 0 . 20 0 . 25 0 . 30 Time Per Epoch (s)

Model Test Set Performance 18 160 TEST SET R 2 : 0 . 9858 140 120 Valid 100 80 60 40 20 20 40 60 80 100 120 140 160 Prediction

Summary

Summary Final Remarks • Maximum performance when GPUs incorporated into every stage of the pipeline • Ensembles: Bagging/Boosting to improve model accuracy/throughput • Shorter training times allows more experimentation • Extensive support available • Deploy this pipeline now in our in-house DGX-1 19 • Leveraging the GPU computation power → dramatic speedups

Questions?

GPU Accelerated Machine Learning for Bond Price Prediction Venkat - PowerPoint PPT Presentation

GPU Accelerated Machine Learning for Bond Price Prediction Venkat Bala Rafael Nicolas Fermin Cota Motivation Primary Goals Demonstrate potential benefjts of using GPUs over CPUs for machine learning Exploit inherent parallelism to

Bond Basics What is a Bond? Bail Bond Gold Bond James Bond Municipal Bond What is a Bond? A

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Chapter 5 Interest Rates and Bond Valuation } Know the important bond features and bond types }

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Picture This! Visualization on GPU Accelerated Supercomputers Peter Messmer, 11/15/2016 NVIDIA

GPU-accelerated similarity searching in a database of short DNA sequences Richard Wilton

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

Bond Oversight Committee Bond 2014 and Bond 2018 April 25, 2019 1 AGENDA 1. Welcome Joanne

Bond Oversight Committee Bond 2014 and Bond 2018 July 18, 2019 1 AGENDA 1. Welcome Joanne

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

GPU-accelerated Data Management Data Processing on Modern Hardware Sebastian Bre TU Dortmund

Adaptive clinical trials with subgroup selection Tim Friede 1 , Nick Parsons 2 , Nigel Stallard 2 1

Estero Boulevard I Improvements Project t P j t Lee County DOT In Cooperation with Town of

miniMap The team at 2am in the morning Jamie Song - js4390@columbia.edu Olesya Medvedeva -

Celebrating Fine Arts in Round Rock ISD March 23, 2017 Regular Board Meeting 1 March is Arts in

Cognex Reports Record First Quarter Results for Revenue, Net Income and Earnings Per Share NATICK,

Model Predictive Control Jn Drgoa (STU) ECC16 June 30, 2016 2 / 17 Model Predictive

Results of the 2016 IEEE WCCI/CEC Competition on Niching Methods for Multimodal Optimization M.G.

Uniformisation of Two-Way Transducers Rodrigo de Souza UFRPE Recife Brazil Supported by

GPU Accelerated Machine Learning for Bond Price Prediction Venkat - PowerPoint PPT Presentation

GPU Accelerated Machine Learning for Bond Price Prediction Venkat Bala Rafael Nicolas Fermin Cota Motivation Primary Goals Demonstrate potential benefjts of using GPUs over CPUs for machine learning Exploit inherent parallelism to

Bond Basics What is a Bond? Bail Bond Gold Bond James Bond Municipal Bond What is a Bond? A

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Chapter 5 Interest Rates and Bond Valuation } Know the important bond features and bond types }

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Picture This! Visualization on GPU Accelerated Supercomputers Peter Messmer, 11/15/2016 NVIDIA

GPU-accelerated similarity searching in a database of short DNA sequences Richard Wilton

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

Bond Oversight Committee Bond 2014 and Bond 2018 April 25, 2019 1 AGENDA 1. Welcome Joanne

Bond Oversight Committee Bond 2014 and Bond 2018 July 18, 2019 1 AGENDA 1. Welcome Joanne

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

GPU-accelerated Data Management Data Processing on Modern Hardware Sebastian Bre TU Dortmund

Adaptive clinical trials with subgroup selection Tim Friede 1 , Nick Parsons 2 , Nigel Stallard 2 1

Estero Boulevard I Improvements Project t P j t Lee County DOT In Cooperation with Town of

miniMap The team at 2am in the morning Jamie Song - js4390@columbia.edu Olesya Medvedeva -

Celebrating Fine Arts in Round Rock ISD March 23, 2017 Regular Board Meeting 1 March is Arts in

Cognex Reports Record First Quarter Results for Revenue, Net Income and Earnings Per Share NATICK,

Model Predictive Control Jn Drgoa (STU) ECC16 June 30, 2016 2 / 17 Model Predictive

Results of the 2016 IEEE WCCI/CEC Competition on Niching Methods for Multimodal Optimization M.G.

Uniformisation of Two-Way Transducers Rodrigo de Souza UFRPE Recife Brazil Supported by

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team