Learning: A Bayesian solution Dmitry P. Vetrov Research professor - PowerPoint PPT Presentation

Open Problems in Deep Learning: A Bayesian solution Dmitry P. Vetrov Research professor at HSE, Head of Bayesian methods research group http://bayesgroup.ru

Idea of the talk 2

Deep Learning • Revolution in machine learning • Deep neural networks approach to human intelligence on a number of problems • May solve quite non-standard problems such as image2caption and artistic style transfer

Open problems in Deep learning • Overfitting Neural networks are prone to catastrophic overfitting on noisy data • Interpretability Nobody knows HOW neural network makes decisions – crucial for healthcare and finances. Legislative restrictions are expected • Uncertainty estimation Current neural networks are very over-confident even when they make mistakes. In many applications (e.g. self-driving cars) it is important to estimate the uncertainty of prediction • Adversarial examples Neural networks can be easily fooled by barely visible perturbations of data

Bayesian framework • Treats everything as a random variables • Allows to encode our ignorance in terms of distributions • Makes use of Bayes theorem

Frequentist vs. Bayesian frameworks

Frequentist vs. Bayesian frameworks • It can be shown that • In other words frequentist framework is a limit case of Bayesian one! • The number of tunable parameters in Modern ML models is comparable with the sizes of training data d n • We have no choice but to be Bayesian!

Bayesian Neural networks • In Bayesian DNNs we treat weights of neural network 𝜄 as random variables • First we define reasonable prior p ( 𝜄 ) • Next we perform Bayesian inference when giving training data and derive posterior

Advantages of Bayesian framework • Regularization Prevents overfitting on the training data because prior does not allow to tune parameters too much

Advantages of Bayesian framework • Regularization Prevents overfitting on the training data because prior does not allow to tune parameters too much • Extensibility Bayesian inference results to posterior which can be now used as prior in next model

Advantages of Bayesian framework • Regularization Prevents overfitting on the training data because prior does not allow to tune parameters too much • Extensibility Bayesian inference results to posterior which can be now used as prior in next model • Ensembling Posterior distribution over the weights defines the ensemble of neural networks rather than single network

Advantages of Bayesian framework • Regularization Prevents overfitting on the training data because prior does not allow to tune parameters too much • Extensibility Bayesian inference results to posterior which can be now used as prior in next model • Ensembling Posterior distribution over the weights defines the ensemble of neural networks rather than single network • Model selection Automatically selects the simplest possible model that explains observed data thus performing Occam’s razor

Advantages of Bayesian framework • Regularization Prevents overfitting on the training data because prior does not allow to tune parameters too much • Extensibility Bayesian inference results to posterior which can be now used as prior in next model • Ensembling Posterior distribution over the weights defines the ensemble of neural networks rather than single network • Model selection Automatically selects the simplest possible model that explains observed data thus performing Occam’s razor • Scalability Stochastic variational inference allows to approximate posteriors using deep neural networks

Dropout • Purely heuristic regularization procedure • Inject either Bernoulli or gaussian noise to the weights during training • The magnitudes of the noise are set manually

Bayesian dropout • Theoretically justified procedure • Corresponds to training of Bayesian ensemble under specific but interpretable prior • Allows to define dropout rates automatically

Visualization LeNet-5: fully-connected layer LeNet-5: convolutional layer (100 x 100 patch)

Avoiding narrow extrema • [Stochastic variational] Bayesian inference corresponds to the injection of noise in gradients • The larger is noise the less is spatial resolution • Bayesian DNN simply DOES NOT SEE narrow local minima

Avoiding catastrophic overfitting • Bayesian model selection procedures effectively apply well- known Occam’s razor • They search for the simplest model capable to explain training data • If there are no dependencies between inputs and outputs Bayesian DNN will never be able to learn them since there always exists a simpler NULL-model “With all things being ng equal, l, the simplest lest expla lanati nation on tend nds s to be the e right one.” Wi William am of of Ockh kham

Ensembles of ML algorithms • If we have several ML algorithms their average is generally better than the application of single best one • The problem is we need to train and keep them all in memory • Such technique is not scalable! • Bayesian ensembles are very compact (yet consist of continuum number of elements) – you only need to sample from posterior Single best accuracy Single algorithms Ensemble

Real data example

Robustness to adversarial attacks • Adversarial examples is another problem in DNN • Single DNNs are very sensitive to adversarial attacks • Ensembles of continuum of DNNs almost cannot be fooled “panda” “gibbon”

Setting desirable properties By selecting the proper prior we may encourage the desired properties in Bayesian DNN: • Sparsity (compression) • Group sparsity (acceleration) • Rich ensembles (improves final accuracy, better uncertainty estimation) • Reliability (robustness to adversarial attacks) • Interpretability (hard attention maps) Techniques to become Bayesian soon • GANs • Normalization algorithms (batchnorm, weightnorm, etc.)

Conclusions • Bayesian framework is extremely powerful and extends ML tools • We do have scalable algorithms for approximate Bayesian inference • Bayes + Deep Learning = • Even the first attempts of NeuroBayesian inference give impressive results • Summer school on NeuroBayesian methods, August, 2018, Moscow, http://deepbayes.ru

Learning: A Bayesian solution Dmitry P. Vetrov Research professor - PowerPoint PPT Presentation

Open Problems in Deep Learning: A Bayesian solution Dmitry P. Vetrov Research professor at HSE, Head of Bayesian methods research group http://bayesgroup.ru Idea of the talk 2 Deep Learning Revolution in machine learning Deep neural

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Year 7 Learning Evening 2017 W elcome! Year 7 Learning Evening 2017 Year 7 Learning Evening

Learning is a never-ending process Tasks come and go, but learning is forever Learn more e ff

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

Welcome to Welcome to The Learning Tree Workshop Series on Learning Differences, Learning

Impasse, Conflict Impasse, Conflict and Learning of CS Notions and Learning of CS Notions David

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Why e Learning can actually be effective for learning an understanding from psycho

Fairness in Machine Learning Fairness in Supervised Learning Make decisions by machine learning:

Objectives Objectives Objectives Objectives Learning Learning Learning Learning

Learning Sciences: Impact on Learning Technologies & Learning Activities Phillip D. Long,

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos & Alex

Two Statistical Paradigms Bayesian versus Frequentist Steven Janke April 2012 (Bayesian

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Precision nuclear physics Observable calculations are becoming increasingly precise Hamiltonian

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Classification Algorithms UCSB 293S, 2017. T. Yang Some of slides based on R. Mooney (UT Austin)

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 17: Bayesian Inference

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Learning: A Bayesian solution Dmitry P. Vetrov Research professor - PowerPoint PPT Presentation

Open Problems in Deep Learning: A Bayesian solution Dmitry P. Vetrov Research professor at HSE, Head of Bayesian methods research group http://bayesgroup.ru Idea of the talk 2 Deep Learning Revolution in machine learning Deep neural

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Year 7 Learning Evening 2017 W elcome! Year 7 Learning Evening 2017 Year 7 Learning Evening

Learning is a never-ending process Tasks come and go, but learning is forever Learn more e ff

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

Welcome to Welcome to The Learning Tree Workshop Series on Learning Differences, Learning

Impasse, Conflict Impasse, Conflict and Learning of CS Notions and Learning of CS Notions David

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Why e Learning can actually be effective for learning an understanding from psycho

Fairness in Machine Learning Fairness in Supervised Learning Make decisions by machine learning:

Objectives Objectives Objectives Objectives Learning Learning Learning Learning

Learning Sciences: Impact on Learning Technologies &amp; Learning Activities Phillip D. Long,

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos &amp; Alex

Two Statistical Paradigms Bayesian versus Frequentist Steven Janke April 2012 (Bayesian

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Precision nuclear physics Observable calculations are becoming increasingly precise Hamiltonian

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Classification Algorithms UCSB 293S, 2017. T. Yang Some of slides based on R. Mooney (UT Austin)

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 17: Bayesian Inference

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Learning Sciences: Impact on Learning Technologies & Learning Activities Phillip D. Long,

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos & Alex