Bayesian Inference and Markov Chain Monte Carlo Algorithms on GPUs - PowerPoint PPT Presentation

Bayesian Inference and Markov Chain Monte Carlo Algorithms on GPUs Alexander Terenin and David Draper University of California, Santa Cruz Joint work with Shawfeng Dong May 11, 2017 Talk for Nvidia GPU Technology Conference arXiv:1608.04329 Special thanks to Nvidia and Akitio for providing hardware

What are we trying to do? Statistical machine learning and artificial intelligence at scale arg min L ( x , θ ) + || θ || L ( x , θ ) : loss function || θ || : regularization Goal: minimize loss • Typical approach: stochastic gradient descent Alternative approach: rewrite loss as an instance of Bayes’ Rule Alexander Terenin and David Draper 1 Bayesian Inference and MCMC on GPUs

Bayesian Representation of Statistical Machine Learning Consider the exponential of the loss f ( x | θ ) ∝ exp { c L ( x , θ ) } π ( θ ) ∝ exp || θ || f ( x | θ ) : likelihood π ( θ ) : prior loss function ⇐ ⇒ posterior distribution New goal: draw samples from f ( θ | x ) • A lot like non-convex optimization Alexander Terenin and David Draper 2 Bayesian Inference and MCMC on GPUs

What are we trying to do? Goal: draw samples from f ( θ | x ) Some difficulties at scale • Big Data: large x , computations using all data will be slow • Complex Models: large θ (can be ≥ x ): curse of dimensionality Why not just find the maximum? • Understand, quantify, and propagate uncertainty • Sampling algorithms are essentially global optimizers • Loss may have no analytic form, making SGD impractical Alexander Terenin and David Draper 3 Bayesian Inference and MCMC on GPUs

Hardware Bayesian inference is inherently expensive: let’s parallelize it • Parallelizable : only has meaning in context • Different types of parallel hardware have different requirements GPUs: main challenges • Memory bottleneck: limited RAM, may need to stream data • Warp divergence: fine-grained if/else → if, wait, else GPUs: design goals • Expose fine-grained parallelism • Minimize branching to control warp divergence • Ideally: run out-of-core (i.e. on minibatches streaming off disk) Alexander Terenin and David Draper 4 Bayesian Inference and MCMC on GPUs

Gibbs Sampling The canonical Bayesian sampling algorithm Draws samples from target with density f ( x, y, z ) sequentially • Full conditionals: f ( x | y, z ) , f ( y | x, z ) , f ( z | x, y ) Algorithm: Gibbs Sampling • Step 1: draw x 1 | y 0 , z 0 • Step 2: draw y 1 | x 1 , z 0 • Step 3: draw z 1 | x 1 , y 1 , repeat until convergence to f ( x, y, z ) How do we parallelize this? Alexander Terenin and David Draper 5 Bayesian Inference and MCMC on GPUs

GPU-accelerated Gibbs Sampling Start with an exchangeable model N � f ( x | θ ) = f ( x i | θ ) i =1 Example: Probit Regression β ∼ N( µ, λ 2 ) y i | z i = round[Φ( z i )] z i | x i , β ∼ N ( x i β , 1) Data Augmentation Gibbs Sampler ( X T X ) − 1 X T z , ( X T X ) − 1 � � z i | β ∼ TN( x i β , 1 , y i ) β | z ∼ N Alexander Terenin and David Draper 6 Bayesian Inference and MCMC on GPUs

GPU-accelerated Gibbs Sampling Data Augmentation Gibbs Sampler ( X T X ) − 1 X T z , ( X T X ) − 1 � � z i | β ∼ TN( x i β , 1 , y i ) β | z ∼ N Both steps are amenable to GPU-based parallelism • Draw β | z in parallel: use Cholesky decomposition • Draw z | β in parallel: z i ⊥ ⊥ z − i for all i by exchangeability Sufficient fine-grained parallelism in X β , X T z , Chol( X T X ) Some tricks used to control warp divergence in TN kernel Overlap computation and output: write β to disk while updating z Alexander Terenin and David Draper 7 Bayesian Inference and MCMC on GPUs

GPU-accelerated Gibbs Sampling Data Augmentation Gibbs Sampler ( X T X ) − 1 X T z , ( X T X ) − 1 � � z i | β ∼ TN( x i β , 1 , y i ) β | z ∼ N What if we add a hierarchical prior such as the Horseshoe? β | λ ∼ N(0 , λ 2 ) λ | ν ∼ π ( ν ) ν ∼ π ( η ) Hierarchical priors factorize: update λ | − and ν | − in parallel • If GPU is not saturated, the computation is essentially free • More complicated model: more available parallelism Alexander Terenin and David Draper 8 Bayesian Inference and MCMC on GPUs

GPU-accelerated Performance Horseshoe Probit Regression CPU and GPU Run Time: 10,000 Monte Carlo iterations Dimension 2:17 1,000,000 GPU: 10,000 GPU: 1,000 Data Size 1:11 GPU: 100 0:28 100,000 2:22 Workstation: 1,000 Workstation: 100 0:41 Laptop: 1,000 0:17 10,000 Laptop: 100 0:23 1:30 0 10 20 30 40 50 60 70 80 90 Time (minutes) It’s lightning fast, and requires no new theory • N = 10 , 000 , p = 1 , 000: 90 minutes → 41 seconds Alexander Terenin and David Draper 9 Bayesian Inference and MCMC on GPUs

Conclusions Bayesian problems can benefit immensely from hardware acceleration • External GPUs, like the Akitio Node , are making this accessible MCMC is both inherently sequential and massively parallelizable • Not well-studied, lots of potential for new results • Stay tuned: minibatch-based MCMC possible in continuous time A. Terenin, S. Dong, and D. Draper. GPU-accelerated Gibbs Sampling. arXiv:1608.04329, 2016. Alexander Terenin and David Draper 10 Bayesian Inference and MCMC on GPUs

Bayesian Inference and Markov Chain Monte Carlo Algorithms on GPUs - PowerPoint PPT Presentation

Bayesian Inference and Markov Chain Monte Carlo Algorithms on GPUs Alexander Terenin and David Draper University of California, Santa Cruz Joint work with Shawfeng Dong May 11, 2017 Talk for Nvidia GPU Technology Conference arXiv:1608.04329

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

Distributed Markov chain Monte Carlo Lawrence Murray CSIRO Mathematics, Informatics and

BAYESIAN CALIBRATION OF COMPUTER MODELS Bayesian inference & Markov chain Monte Carlo

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Nonparametric Bandits with Covariates Philippe Rigollet Princeton University with A. Zeevi

FORECASTING PAKISTANI STOCK MARKET VOLATILITY WITH MACROECONOMIC VARIABLES: EVIDENCE FROM THE

Third International Conference on Business Analytics and Intelligence Tracks Schedule Track: 02

SQLite as a Result File Format in OMNeT++ Rudolf Hornig OMNeT++ Result Files Scalar and

Bayesian Inference of a Finite Population Mean Under Length-Biased Sampling Zhiqing Xu, Balgobin

Bayes Estimator Lecture 15 Biostatistics 602 - Statistical Inference . . Summary Conjugate

SIPTA-Community based on Paper Contributions Paul Fink July 3, 2019 ISIPTA 2019 Ghent, Belgium

Sequential Optimal Inference for Experiments with Bayesian Particle Filters Remi Daviet Wharton

Bayesian Inference and Markov Chain Monte Carlo Algorithms on GPUs - PowerPoint PPT Presentation

Bayesian Inference and Markov Chain Monte Carlo Algorithms on GPUs Alexander Terenin and David Draper University of California, Santa Cruz Joint work with Shawfeng Dong May 11, 2017 Talk for Nvidia GPU Technology Conference arXiv:1608.04329

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

Distributed Markov chain Monte Carlo Lawrence Murray CSIRO Mathematics, Informatics and

BAYESIAN CALIBRATION OF COMPUTER MODELS Bayesian inference &amp; Markov chain Monte Carlo

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Nonparametric Bandits with Covariates Philippe Rigollet Princeton University with A. Zeevi

FORECASTING PAKISTANI STOCK MARKET VOLATILITY WITH MACROECONOMIC VARIABLES: EVIDENCE FROM THE

Third International Conference on Business Analytics and Intelligence Tracks Schedule Track: 02

SQLite as a Result File Format in OMNeT++ Rudolf Hornig OMNeT++ Result Files Scalar and

Bayesian Inference of a Finite Population Mean Under Length-Biased Sampling Zhiqing Xu, Balgobin

Bayes Estimator Lecture 15 Biostatistics 602 - Statistical Inference . . Summary Conjugate

SIPTA-Community based on Paper Contributions Paul Fink July 3, 2019 ISIPTA 2019 Ghent, Belgium

Sequential Optimal Inference for Experiments with Bayesian Particle Filters Remi Daviet Wharton

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

BAYESIAN CALIBRATION OF COMPUTER MODELS Bayesian inference & Markov chain Monte Carlo